Attorney Docket No. 017853-0145 
Application No. 10/799,782 



REMARKS 

The Applicants respectfully request reconsideration of the present application in view of 
the reasons that follow. 

I. Status of the claims 

No claims are added, canceled or amended in this paper. Accordingly, claims 5 and 6 are 
pending and under examination. 

II. Claim rejection - non-statutory, obviousness-type double patenting 

Claims 5-6 stand rejected on the grounds of non-statutory obviousness-type double 
patenting over claims 1-9 of U.S. Patent No. 5,851,999 ("the '999 Patent"). The Office Action 
asserts that claims directed to a pharmaceutical composition comprising an expression vector 
encoding a truncated FLK-1 polypeptide and a pharmaceutically acceptable carrier renders 
obvious claims directed to a cell line comprising a recombinant vector encoding a truncated 
FLK-1 polypeptide. The Applicants respectfully traverse this ground for rejection. 

As noted in the previous reply, the proposed modification would change the principle of 
operation of the pharmaceutical composition. Additionally, a cell line is not an obvious variant 
or modification of a pharmaceutical composition. 

A. The proposed modification would change the principle of operation of the 
pharmaceutical composition 

Claims 1-9 of the '999 Patent relate to a pharmaceutical composition comprising an 
expression vector encoding a truncated FLK-1 polypeptide, and a pharmaceutically acceptable 
carrier. A pharmaceutical composition comprising an expression vector would have a different 
principle of operation than a pharmaceutical composition including a cell line. For example, a 
pharmaceutical composition comprising an expression vector would be administered to a patient, 
(likely targeted to a specific cell population) for uptake by the patient's cells. Once within the 



MILW_9796531 



-2- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

targeted cell population, the encoded truncated polypeptide would be expressed within these 
specific cells. The cells expressing the truncated polypeptide would exhibit the dominant 
negative phenotype (e.g., inhibit the cellular effects of VEGF binding). In contrast, if a foreign 
cell line expressing FLK-1 was administered to a patient, the foreign cell line would express the 
polypeptide outside the target cells. The patient cells would have to somehow "uptake" the 
truncated polypeptide to illicit the dominant negative affect (i.e., inhibit the cellular effects of 
VEGF binding). 

B. A cell line is not an obvious variant or modification of a pharmaceutical 
composition 

Contrary to the Office Action assertions, one skilled in the art would not find a cell line 
comprising a recombinant vector to be obvious in light of a pharmaceutical composition 
comprising an expression vector and a pharmaceutically acceptable carrier. First, a 
pharmaceutical composition comprising an expression vector and a pharmaceutically acceptable 
carrier is clearly intended for administration to a patient. A cell line comprising a recombinant 
vector may not be intended for administration to a patient. 

Second, it is possible that administration of a cell line, as opposed to the pharmaceutical 
composition, could be harmful to the patient and may not yield therapeutic results. For example, 
it is likely that administering a foreign cell line to a patient would illicit an immune response 
from the patient. If the patient is suffering from cancer and has undergone chemotherapy, there is 
a high probability that the patient's immune system is already weakened. Challenging the patient 
by administering foreign cells would likely cause the patient to become even weaker. 
Furthermore, a cell line administered to a patient may not even express, or be capable of 
expressing, the truncated polypeptide after administration. Thus, one of ordinary skill in the art 
would not consider a cell line including a recombinant vector obvious in light of a 
pharmaceutical composition comprising an expression vector and a pharmaceutically acceptable 
carrier. 



MILW_9796531 



-3- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

Accordingly, for at least these reasons, reconsideration and withdrawal of the 
obviousness-type double patenting rejection is respectfully requested. 

III. Claim rejection - 35 U.S.C. $ 103(a) 

Claims 5-6 are rejected under 35 U.S.C. § 103(a) as allegedly being unpatentable over 
U.S. Patent No. 5,185,438 ("Lemischka"), Matthews et al, PNAS, 88:9026 (1991) ("Matthews"), 
and Terman et al, BBRC, 757:1579 (1992) ("Terman"), in view of Ullrich et al., Cell, 61:203 
(1990) ("Ullrich"), and Ueno et al. (Science, 252: 844 (1991) ("Ueno-1") and JBC, 267:1470 
(1992) ("Ueno-2")). The Office Action assert that because Flk-1 allegedly shares some aspects 
of homology to PDGFR, FGFR1 and EGFR, that the "invention follows the teaching of the prior 
art, and achieves exactly the result that would be predicted on the basis of the prior art," and that 
"there are no unexpected results." Office Action at page 5. Thus, the Office Action alleges that 
not only would the skilled artisan find it obvious to make the claimed mutant polypeptide (amino 
acids 1-806 of SEQ ID NO: 2), but that it would not be surprising that the claimed mutant 
polypeptide would fold into a conformation that could dimerize with a wild-type Flk-1 
polypeptide; that the dimer would bind VEGF; and that the dimer would prevent VEGF 
signaling. The Applicants respectfully traverse this ground for rejection and assert that only with 
impermissible hindsight would the skilled artisan consider the claimed invention "obvious." 

A. Protein function cannot accurately be predicted based on structural or 
functional homology 

Assigning protein function based on sequence homology is viewed with skepticism in the 
art. For example, Wells illustrates that changes in amino acid sequence, even a change of a few 
amino acids, can result in proteins with unpredictable function (Wells, Biochemistry, 29(37) 
8509-17 (1990), EXHIBIT A). Moreover, Attwood (Attwood, Science, 290: 471-473 (2000), 
EXHIBIT B) teaches that "[i]t is presumptuous to make functional assignments merely on the 
basis of some degree of similarity between sequences." Similarly, Skolnick et al. (Skolnick et 
al., Trends in Biotech., 75(1): 34-39 (2000), EXHIBIT C) teach that the skilled artisan is well 



MILW_9796531 



-4- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

aware that assigning functional activities for any particular protein or protein family based on 
sequence homology is inaccurate, in part because of the multifunctional nature of proteins (see 
e.g., Skolnick et al. at Abstract and "sequence-based approaches to functional prediction," page 
34). 1 

In the present situation, there is very little homology between these polypeptides at the 
amino acid level. Exhibits D-F show protein BLAST alignments of SEQ ID NO: 2 amino acids 
1-806 against PDGFR (human; Accession Number NP__002600, EXHIBIT D), EGFR (human; 
Accession Number AAH94761, EXHIBIT E), and FGFR1 (human; Accession Number 
AAH15035, EXHIBIT F) and as identified on the NCBI database. The alignments make clear 
that sequence homology between these polypeptides is minimal. Accordingly, the skilled artisan 
would not assume or reasonably believe that the truncated form of Flk-1 would behave in a 
manner similar to a truncated form of PDGFR, EGFR or FGFR1 . 

Even in situations where there is some confidence of similar overall structure between 
two proteins, only experimental research can confirm the artisan's best guess as to the function of 
the structurally related proteins (see e.g., Skolnick, in particular Abstract and Box 2). It is well 
known in the art that predicting protein function from sequence data is extremely complex. 
While it may be that one or more deletions are generally possible in any given protein, the 
positions within the protein's sequence where such deletions can be made with a reasonable 
expectation of producing a desired or expected function should be determined empirically for 
each protein. Certain positions within a sequence are critical to the protein's structure/function 
relationship, e.g., such as various sties or regions directly involved in binding, activity and in 
providing the correct three-dimensional spatial orientation of binding and active sites (see e.g., 
Wells). 



! MPEP § 2124 states that "in some circumstances a factual reference need not antedate the filing date," for 
example, when describing a "scientific truism." Both Atrwood and Skolnick present such scientific truisms. 



MILW_9796531 



-5- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

Accordingly, the Applicants respectfully assert that even in light of the combination of 
cited references, the skilled artisan would not have expected the claimed polypeptide - a deletion 
mutant of Flk-1 with very little amino acid homology to EGFR, FGFR or PDGFR- to form a 
three-dimensional structure capable of dimerizing with a wild-type Flk-1 polypeptide, function 
with the wild-type polypeptide to bind VEGF, and to inhibit VEGF signaling function. As such, 
there would be no reason for the skilled and creative artisan to try to generate the claimed Flk-1 
mutant, and there would be no reasonable expectation that such a mutant would even be 
functional, yet alone be beneficial. The Applicants respectfully assert that the claimed cell lines 
are not obvious in light of the combination of cited references. 

B. None of the cited references disclose that Flk-1 functions as a dimer; 

accordingly, activity of a Flk-1 mutant can not be predicted based on the 
activity of kinase receptor dimer mutants 

The Office asserts that Terman teaches "that it would be desirable to investigate the 
dimeric combinations in which the receptor occurs; Terman does not doubt that the receptor is a 
dimer." Office Action at page 4. The Applicants respectfully disagree with the Examiner's 
characterizations regarding Terman. In fact, Terman states that "_[i]t is not known whether KDR 
and fit can form functionally active dimers analogous to the PDGF receptor dimers" and " it is not 
known whether KDR,flt or heterodimer KDR//7/ mediates mitogenic activity and/or vascular 
permeability." Terman at page 1585. Accordingly, the teaching of Terman does not substantiate 
the Examiner's assertions that Flk-1 was known to function as a dimer. 

The Office also asserts that "[w]ith respect to Ullrich, combination of the subunit of SEQ 
ID NO: 2 in vivo in a cell that expresses Flk-1 would inherently result in a combination with a 
"normal" subunit, regardless of whether the receptor were a homo- or hetero-dimer." Office 
Action at page 4. However, the claimed SEQ ID NO: 2 mutant is neither taught nor suggested in 
Ullrich. Accordingly, this argument is moot with respect to the claimed invention. 



MILW_9796531 



-6- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

The Office Action also asserts that Ullrich teaches that "[r]eceptor oligomerization is a 
universal phenomenon among growth factor receptors." Office Action at page 4. Ullrich does 
indeed make this statement. However, this statement, alone or in combination with any or all of 
the cited references, does not in any way teach or suggest that the claimed mutant polypeptide 
would be able to "oliogmerize." 

The Office Action also cites Ullrich with respect to the function of various kinase 
mutants: "While the kinase activity of the various receptors was dispensable for their expression 
and targeting to the cell surface it was indispensable for signal transduction and induction of 
other early and delayed cellular responses. . . ." Office Action at page 4. Based on this statement, 
the Office Action concludes "without knowing the subunit structure of the receptor, one would 
expect that a subunit lacking the tyrosine kinase domain would have dominant negative signaling 
effects." Office Action at page 5. The Applicants respectfully disagree with the characterization 
of Ullrich and the conclusion reached. While Ullrich describes mutations of the EGF, insulin 
and PDF receptors, Ullrich does not teach or suggest Flk-1 or Flk-1 mutations. Moreover, 
Ullrich in combination with the cited references does not teach or suggest that wild-type Flk-1 
functions as a dimer; accordingly, there is no reason one of skill in the art would conclude that 
the claimed mutant Flk-1 would form a dimer with its wild-type counterpart. Further, one of 
ordinary skill in the art would not conclude that "without knowing the subunit structure of the 
receptor, one would expect that a subunit [of Flk-1] lacking the tyrosine kinase domain would 
have dominant negative signaling effects." 

With respect to mutant dimerization and mutant-dimer function, the Office Action asserts 
that "Applicant are arguing limitations that are not found in the claims," and that "neither a 
dimer, nor any particular activity" is required in the claims. Office Action at page 5. The 
Applicants respectfully traverse this assertion. 



MILW_9796531 



-7- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

The Applicants assert that because the ability of wild-type Flk-1 to dimerize was 
unknown at the time of filing, the fact that the claimed mutant Flk-1 could dimerize, that a dimer 
including the claimed mutant Flk-1 would bind VEGF, and that the a dimer including the 
claimed mutant Flk-1 would inhibit VEGF signaling is completely unexpected . 

For at least these reasons, the Applicants respectfully disagree with the Office Action 
assertions that one skilled in the art would find it obvious to make the truncated mutants, and to 
expect the claimed truncated polypeptide "to render endogenous wild-type Flk-1 unresponsive to 
VEGF and inhibit the cellular effects of VEGF binding." The Applicants respectfully contend 
that, for the claimed Flk-1 sequence, dominant negative inhibition of cellular effects of VEGF 
binding would not have been obvious to a skilled artisan. Specifically, one of ordinary skill in 
the art would not have expected the claimed polypeptide to "render endogenous wild-type Flk-1 
unresponsive to VEGF" as recited in the claims because even the wild-type Flk-1 was not know 
to function as a dimer . 

Accordingly, reconsideration and withdrawal of the rejection under 35 U.S.C. § 103(a) is 
respectfully requested. 

C. Because fit was a known VEGF receptor with high affinity for VEGF, it was 
unexpected that the flk-1 mutant would inhibit the cellular effects of VEGF 
binding 

In addition, it was entirely unexpected that the truncated Flk-1 variant would have an 
inhibitory effect on the cellular response of VEGF. Such results were unexpected because at least 
one other receptor, fit, was known to bind VEGF with high affinity 2 It also was known that fit is 
expressed in endothelial cells of a growing tumor. 3 Significantly,^ has a 50-fold higher affinity 
for VEGF than Flk-1. 4 Given the importance of VEGF signaling in angiogenesis during inter alia, 

2 See e.g., Terman et al., BBRC, 187:1579 (1992). 

3 See, Plate et al., Nature, 359: 845-848 (1992); Plate et al., Cancer Research, 53: 5822-5827 (1993). (Ref. 

A35) 

4 See, Waltenberger et al., J. Biol. Chem., 269: 26988-26995 (1994). 



MILW_9796531 



-8- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

development, wound healing and organ regeneration, some redundancy in the system would be 
expected. Consequently, the skilled artisan would not have expected that blocking the Flk-1 
signaling pathway would shut down the cellular response to VEGF, resulting in suppression of 
angiogenesis and inhibition of tumor growth. Rather, one of ordinary skill in the art would have 
anticipated that the biological response to VEGF, such as the proliferation of blood vessels, would 
still be transduced through jit or some as yet undiscovered receptors. 

For at least these reason, the ability of the claimed truncated Flk-1 receptor proteins to 
inhibit angiogenesis were unexpected. Accordingly, reconsideration and withdrawal of the 
rejection is respectfully requested 

IV. Conclusion 

The Examiner is invited to contact the undersigned by telephone if it is felt that a 
telephone interview would advance the prosecution of the present application. 

The Commissioner is hereby authorized to charge any additional fees which may be 
required regarding this application under 37 C.F.R. §§ 1.16-1.17, or credit any overpayment, to 
Deposit Account No. 19-0741. Should no proper payment be enclosed herewith, as by the credit 
card payment instructions in EFS-Web being incorrect or absent, resulting in a rejected or 
incorrect credit card transaction, the Commissioner is authorized to charge the unpaid amount to 
Deposit Account No. 1 9-074 1 . 



MILW_9796531 



-9- 



Attorney Docket No. 017853-0145 
Application No. 10/799,782 

If any extensions of time are needed for timely acceptance of papers submitted herewith, 
the Applicants hereby petition for such extension under 37 C.F.R. § 1.136 and authorize payment 
of any such extensions fees to Deposit Account No. 19-0741 . 

Respectfully submitted, 



Date: February 18,2010 

FOLEY &LARDNER LLP 
Customer Number: 22428 
Telephone: (202) 672-5538 
Facsimile: (202) 672-5399 



By: /Stephanie H. Vavra/ Reg. No. 45,178 

for Michele M. Simkin 
Attorney for the Applicants 
Registration No. 34,717 



MILW_9796531 



-10- 



Biochemistry 



© Copyright 1990 by the American Chemical Society 



Volume 29, Number 37 September 18, 1990 



Perspectives in Biochemistry 



Additivity of Mutational Effects in Proteins 

James A. Wells 

Protein Engineering Department, Genentech, Inc.. 460 Point San Bruno Boulevard, South San Francisco, California 94080 
Received April 19, 1990; Revised Manuscript Received May 29, 1990 



lie energetics of virtually all binding functions in proteins 
is the culmination of a set of molecular interactions. For 
example, removal of a single molecular contact by a point 
mutation causes relatively small reductions (typically 0.5-5 
kcal/mol) in the free energy of transition-state stabilization 
(for reviews see Fersht (1987) and Wells and Estell (1988)], 
protein-protein interactions (Laskowski et al., 1983, 1989; 
Ackers & Smith, 1985), or protein stability [for review see 
Matthews (1987)] compared to the overall free energy asso- 
ciated with these functional properties (usually 5-20 kcal/mol). 
Thus, it is possible to modulate protein function by mutation 
at many contact sites. In fact, to design large changes in 
. function will often require mutation of more than one func- 
tional residue. 

There is now a large data base for free energy changes that 
result when single mutants are combined. A review of these 
data shows that, in the majority of cases, the sum of the free 
energy changes derived from the single mutations is nearly 
equal to the free energy change measured in the multiple 
mutant. However, there are two major exceptions where such 
simple additivity breaks down. The First is where the mutated 
residues interact with each other, by direct contact or indirectly 
through electrostatic interactions or structural perturbations, 
.so'that they no longer behave independently. The second is 
where the mutation causes a change in mechanism or rate- 
limiting step of the reaction. It is important to note that the 
addifiveWects discussed here do not change the molecularity 
of their respective reactions. When the molecularity of the 
reaction changes [as in comparing the free energy of binding 
of one linked substrate ( A-B) versus the sum of two fragments 
(A plus B)], large deviations from simple additivity can result 
from entropic effects (Jencks, 1981). Although the focus here 
is on enzyme activity, similar conclusions may be drawn from 
mutations affecting protein-protein interactions, protcin-DNA 
recognition, or protein stability. Some practical examples and 
applications are discussed. 

ADDtTrvrrY Relationships 

The change in free energy of a functional property caused 
by a mutation at site X is typically expressed relative to that 



of the wild-type protein as AAGqq. Such free energy changes 
for two single mutants (X and Y) can be related to those of 
a double mutant (designated X.Y) by eq 1 (Carter et al., 1984; 
Ackers & Smith, 1985). The AG t term (also called the 
AAGocy, * AACfx, + AAG(Y) + AO, (1) 
coupling energy; Carter et al., 1984) should reflect the extent 
to which the change in energy of interaction between sites X 
and Y affects the functional property measured. It is possible 
for AG! to be either positive or negative depending upon 
whether the interactions between the mutant side chains reduce 
or enhance the functional property measured. Furthermore, 
the AG[ term should not exceed the free energy of interaction 
between side chains at sites X and Y except in cases where 
these mutations cause large structural perturbations. This was 
first applied to evaluating the functional independence of 
residues mutated in tyrosyl-tRNA synthetase (Carter et al., 
1984). In one case the sum of the AAG values for single 
mutants was equal to that of the double mutant, indicating 
the sites functioned independently; in another example there 
was a large discrepancy, suggesting the sites were interacting. 

Simple Additivity in Transition-State Binding 
Interactions 

The strengths of noncovalent interactions are strongly de- 
pendent upon the nature of the two groups and the distance 
(r) between them. For example, the free energy of charge- 
charge, random charge-dipole, random dipole-dipole, van der 
Waals attraction, and repulsion decay as 1/r, I /r 4 , 1 /r 6 , 1 //*, 
and l/r", respectively [for review see Fersht (1985)]. Thus, 
when the side chains at sites X and Y are remote to one 
another and assuming no large structural perturbations, the 
AG, term should be negligible and eq 1 thus simplifies to 
AAG ( x,Y) - AAGpQ + AAG(Y) (2) 
This situation, here referred to as simple additivity, is generally 
observed except where side chains are close to each other or 
when one or both of the mutants change the rate-limiting step 
or reaction mechanism. These principles are well illustrated 
from data of additive mutational effects on transition-state 
stabilization energies. 



0006-2960/90/0429-8509S02.50/0 © 1990 American Chemical Society 




8510 Biochemistry, Vol. 29, No. „/, 1990 




ZAAS, components 
figure 1: Plot of the changes in transition-state stabilization energies 
for the multiple mutant versus the sum for the component mutants. 
Data are taken from Table I and represent mutants from subtilisin 
(■), tyrosyl-tRNA synthetase (O), trypsin (n). DHFR (•), and 
glutathione reductase (a), where mutant or wild-type side chains 
should not contact one another. The dashed line has a slope of 1, 
and the solid line is a best fit to all the data. 

Changes in transition-state stabilization energy (AAGj) 
caused by a mutation can be calculated from eq 3 (Wilkinson 
et al., 1983), in which R is the gas constant, T is the absolute 

temperature, * M , is the turnover number, and K u is the Mi- 
chaelis constant for the mutant and wild-type enzyme against 
a fixed substrate. AAGt represents the change in free energy 
to reach the transition-state complex (E<S*) from the free 
enzyme and substrate (E + S). 

To analyze the proposition that the interaction energy term, 
AG T0) , is relatively small when the sites of mutation (X and 
V) are remote to one another, AAG T values were collected 
from the literature where side-chain substitutions in the 
multiple mutant are beyond van der Waals contact (>4 A 
distant) from each other (Table I). There are at least 25 
examples distributed across five different enzymes where 
AAG T values can be calculated for the individual and multiple 
mutants assayed in at least two different ways. Among these 
are examples where electrostatic interactions, hydrogen 
bonding, and steric and hydrophobic effects have been altered 
separately or in combination with others. The X-ray structures 
of the wild-type proteins show that the wild-type side chains 
are not in contact Modeling suggests the mutant side chains 
are beyond possible van der Waals contact unless the mutant 
side chains were to cause significant changes in the overall 
protein structure. Such large changes are rarely observed in 
structures of site-specific mutant proteins (Katz & Kossiakoff, 
1986; Alber et al„ 1987; Howell et al., 1986: Wilde et al., 
1988) or even highly variant natural proteins (Chothia & Lesk, 
1986). 

A collective plot of the sum of the AAG£ values for the 
component mutants versus the corresponding multiple mutant 
(Table I) gives a remarkably strong correlation (R 2 ~ 0.92) 
with a slope near unity (Figure 1 ). The simplest interpretation 
is that the interaction term, AG™, is small compared to the 
overall effects on AAG^y)- & formally possible that there 
are large and compensating effects between side chains X and 
Y that systematically lead to small net values for AGjq). 

There are some notable exceptions that weaken the corre- 
lation within the data set (Table I). In particular, combining 
the R204L mutation in Escherichia coli glutathione reductase 
gives a less than additive effect, especially when combined with 



Perspectives in Bi< 

another mutant, R198M (Scrutton et al., 1990). These basic 
residues are not in direct contact, but both side chains form 
a salt bridge with the 2'-phosphate group of NADPH. Indeed, 
the largest discrepancies are when these mutants are assayed 
with NADPH as compared to NADH. Similarly, the sum 
of the AAG| values for two positively charged component 
mutants in subtilisin (D99K and E156K) overestimates the 
effect of the multiple mutant when assayed with an Arg but 
not with a Phe substrate (Russell & Fersht, 1987). Such 
discrepancies are not too surprising because charge-charge 
interactions fall off as 1 jr and can exhibit long-range effects 
in proteins [for example, see Russell and Fersht (1988)]. The 
physical basis for other large discrepancies not involving 
electrostatic substitutions is less clear but may involve unex- 
pectedly large structural changes or changes in enzyme 
mechanism (see below). 

These additivity tests are not particularly dominated by one 
of the single mutants in the sum. The average contribution 
(±SE) for the most dominant mutant in ech sum calculated 
from the 69 additivity tests given in Table I is only 68% 
(±15%) of the total sum (theoretical is ~50%). Furthermore, 
the plot in Figure 1 is not analogous to graphs of correlated 
variables, where A is plotted versus the sum of A + B, because 
in Figure 1 the values on the >>-axis are determined inde- 
pendently from those on the x-axis. 

Complex ADDrnvrnr in Transition-State 
Stabilization— When AG T(r , * o 

(A) Change in Interaction Energy between Sites XandY. 
Where residues X and Y are close enough to contact, it is more 
likely that the AG$ m term will be significant. There are 1 1 
examples collectively from tyrosyl-tRNA synthetase and 
subtilisin that fit this category (Table II). 

A series of mutants in tyrosyl-tRNA synthetase at positions 
48 and 51 (Carter et al., 1984; Lowe et al, 1985) show com- 
plex additivity (Table II). His48 and Thr51 in the wild-type 
structure are next to each other on adjacent turns of ana-helix. 
His48 hydrogen bonds to the ribose ring oxygen of ATP while 
Thr51 can make van der Waals contact with ATP. The T51P 
mutation increases the catalytic efficiency of the enzyme in 
some assays by more than -2 kcal/mol (Wilkinson et al., 
1984). However, when this mutation is combined with mu- 
tations at position 48, the effects are not simply additive. An 
X-ray structure of the T51P mutant indicates there are no 
structural changes in the o-helix (Brown et al., 1987). Instead, 
it is suggested that the TS1P mutant is improved over wild 
type because the wild-type enzyme contains a bound water in 
the vicinity of Thr51 that disfavors substrate binding. Blow 
and co-workers (Brown et al., 1987) argue that the change 
in solvent structure propagated to position 48 may account for 
the complex additivity. In the previous section, the double 
mutant (H48G.T51A) exhibited nearly simple additivity 
(Table I). Presumably, the smaller and less hydrophobic 
alanine substitution at position 51 should not introduce as large 
a change in solvent structure as the pyrrolidone ring of proline. 

In the case of subtilisin (Table II), Glul56 is near the top 
of the PI binding crevice while Glyl66 is at the bottom. In 
the wild-type enzyme these sites do not make direct van der 
Waals contact, but large side chains substituted at position 
166 can be modeled to contact the residue at position 156. In 
fact, X-ray structural analysis shows that an Asn side chain 
at position 166 makes a good hydrogen bond with Glul56 
(Bott et al., 1987). Moreover, all of the substitutions are polar 
or charged, the energetics of which are expected to be the most 
long range. Thus, the mutant side chains alter substantially 
the intramolecular interactions between positions 156 and 166. 



Perspective* in Biochemistry 



Biochemistry, Vol. 29, No. 37, 1990 851 1 



Table I; Comparison of Sums of AAG T * from Comp 
Not Contact One Another 


onent Muunti vs the Mul 


tiple Mutant Where the Muti 


int or Wild-Type Side Chains Do 


AA<? T * 






AAC/ 


gj^y component mutants nltr 


multiple 


iIMy component mi 


sum IS, 



Tyrosyl-tRNA Synthetase 



Subtilisin BPN' 





C35G + H48G' 








D99K + E156K 






ATP/PP, 


+1.20 +1.04 


+2.24 


+2.30 


R 


+1.29 +2.12 


+3.41 


+2.74 


ATP/tRNA 


+ 1.05 +1.13 


+2.18 


+1.68 


F 


+0.13 -0.49 


-0.36 


-0.42 












E156S, 






Tyr/tRNA 


+6.22 + U2 


+1.45 


+1.20 




G166A + G169A, 






C35G +T51P 








Y217L' 






ATP/PP, 


+ 1.20 -1.91 


-0.71 


-1.14 


F 


-0.40 -1.46 


-1.86 


-1.76 










Y 


+0.94 -1.03 


-0.09 






+ U4 


+0^50 






..... , S24C, 
G166A+ H64A 






yr/ 


+0.32 +0.50 
C35G + T5IC* 


+0.82 




F 


-0.40 +4.96 


+4.56 


+4.11 


ATP/tRNA 


+ 1.05 -0.93 


+0.12 


-0.22 


Y 


+0.94 +4.40 


+5.34 


+5.84 


ATP/Tyr 


+ 1.14 -0.91 


+0.23 


-0.13 




E156S. S24C 

gi69a. + ivr? 






H48N + T51A' 












ATP/PP, 
ATP/tRNA 


+0.26 -0.38 
-0.13 -0.32 
T40A + H45G' 


-0.12 
-0.45 


+0.04 
-0.37 


F 


Y217L 

-1.46 +4.96 
-1.03 +4.40 
S24C, 

H64A, +OI „. 
GI69A, + 0166A 
Y2I7L 


+3.50 
+3.37 


+4.21 
+3.96 




+5.02 +3.15 


+8.17 












+5.13 +2.44 

Rat Trypsin 


+7.57 










K 
R 


G216A + G226A' 
+2.75 +3.13 
+2.19 +4.91 


+5.88 
+7.10 


+5.07 
+5.90 


F 
Y 


+4.21 -0.40 
+3.96 +0.94 
S24C, E156S, 
H64A, + G169A, 


+3.81 
+4.90 


+3.53 
+6.07 




Dihydrofolate Reductase (AAGh-hu) 






G166A Y217L 








F31V + L54C 






F 


+4.11 -M6 




HjF 


+ 1.6 +2.9 


+4.5 


+4.5 


Y 


+J.84 -1.03 
E5S6S, 
S24C, + G166A. 
H64A G169A 


+4.81 


+6.07 


MTX 


+2.2 +2.9 

Subtilisin BPN' 


+5.1 


+4.5 








E 
0 


E156S+ Y217L + G169A' 
-1.43 -0.87 -0.62 
-0.60 -0.36 -0.32 


-2.92 
-1,28 


-2.06 
-1.14 


F 


Y217L 
+4.96 -1.76 
+4.40 +0.02 


+3.20 


+3.53 


A 


-0.15 -0.41 -0.27 


-0.83 


-0.92 




E. coii Glutathione Ream 


ctase 




K 


+ 1.70 -0.08 -O.30 


+1.32 


+0.87 




A179G + R198W 






M 


-0.86 -0.32 -0.39 


-1.57 


-1.41 


NADH 


-1.10 -0.62 


-1.72 


-1.32 


F 


-0.61 -0.29 -0.66 


-1.56 


-1.17 


NADPH 


+0.08 +2.68 


+2.76 


+2.11 


Y 


-0.24 -0.12 -0.41 


-0.77 


-0.59 




A179G + R204L 








EI56S + Y217L 






NADH 


-1.10 +0.41 


-0.69 


-1.54 


E 


-1.43 -0.87 


-2.30 


-1.67 


NADPH 


+0.08 +2.42 


+2.50 


+0.87 


Q 


-0.60 -0.36 


-0.96 


-0.96 




R198M + R204L 








-0.15 -0.41 


-0.56 






-0.62 +0.41 


-0.21 


-0.51 


K 


+1.70 -0.08 


+ 1.62 


+1.33 


NADPH 


+2.68 +2.42 


+5.10 


+3.70 


M 


-0.86 -0.32 


-1.18 


-1,11 










F 


-0.61 -0.29 


-0.90 


-0.84 




A170G + R.79M. 






Y 


-0.24 -0.12 


-0.36 


-0.32 


NADH 


-1.10 -0.51 


-1.61 


-1.72 


E 


EI56S, . 

Y217L + G169A 
-1.67 -0.62 


-2.29 


-2.06 


NADPH 


+0.08 +3.70 
R.98M+££G. 


+3.78 


+2.22 


Q 


-0.96 -0J2 


-1.28 


-1.14 


NADH 


-0.62 -1.54 


-2.16 


-1.72 


A 


-0.53 -0.27 


-0.80 


-0.92 


NADPH 


+2.68 +0.87 


+3.55 


+2.22 


K 
M 


+ 1.33 -0.3C 
-1.11 -0.39 


+1.03 
-1.50 


+0.87 
-1.41 






F 


-0.84 -0.66 


-1.50 


-1.17 


NADH 


+0.41 -1.32 


-0.91 


-1.72 


Y 


-0.32 -0.41 
D99S + E156S* 


-0.73 


-0.59 


NADPH 


+2.42 +2.11 

R179G + R198M + R204L 


+4.53 


+2.22 


R 


+0.47 +0.77 


+1.24 


+ 1.52 


NADH 


-1.10 -0.62 +0.41 


-1.31 


-1.72 


F 


0 -0.62 


-0.62 


-0.52 


NADPH 


+0.08 +2.68 +2.42 


+5.18 


+2.22 



•Carter et al. (1984). The assays refer to measurements of ATP-dependent pyrophosphate exchange (ATP/PPt) or tRNA charging (ATP/tRNA) 
under saturating conditions for tyrosine and vice versa for Tyr/PP, exchange and Tyr/tRNA charging. »Lawe et al. (1985). The ATP/Tyr acti- 
vation assay refers to formation of tyrosyl adenylate under saturating concentrations of tyrosine. 'Jones et al. (1986). 'Leatherbarrow et al. (1986). 
The ATP/Tyr and Tyr/Tyr activation assays refer to formation of tyrosyl adenylate under pre-steady-state conditions, and k^JK* » calculated from 
Aj/tfj for tyrosine and ATP, respectively. 'Craik et al. (1985). The substrate was D-VaI-Leu-(X)-aminonuorocoumarin where the PI residue (X) 
is other Lys (K) or Arg (R). /Mayer et al. (1986). The ligand was either dihydrofolate (H 2 F) or methotrexate (MTX). 'Well* « al. (1987a). The 
substrate was suctinyl-L-Ala-L-AU-L-Pro-L-(X)-p-nitroamlide where the PI (X) residue (Schechter A Berger, 1937) was either Glu (E), Gin (Q), 
Ala (A), Lys (K), Met (M), Phe (F), or Tyr (Y). *Russell and Fenht (1987). The substrate was benzoyl-L-Val-Gly-L-Arg-p-nitroanilide (R) or 
succinyl-L-Ala-L-Ala-L-Pro-L-Phe-p-nitroanaide(F). 'Carteretal. (1989). The substrate was succmyl-L-Phe-L-Ata-L-His-L-(X>^nhroanilide where 
X was either Phe (F) or Tyr (Y). ■'Scrutton et al. (1990). The assay followed the reduction of oxidized glutathione by NADH or NADPH. 



8512 Biochemistry, Vol. 29. No, 1990 



Perspectives in Biochemistry 



Table II: Comparison of Sunn of AA«V from Co 
vs the Multiple Mutant Where the Mutant Side Q 
One Another 





component 






assay* 


mutanti 




multiple mutant 




Tyrosyl-tRNA Synthetase 


















+ 1.07 


ATP/ANA 














+ L02 


Tyr/tRNA 


+ 1.12 +0.50 


+ 1.63 


+0.17 




+0.95 -1.99 


-1.04 


+ 1.04 




+ 1.07 -0.38 


+0.69 


+0.82 


H48N + T51P 






ATP/Tyr 






-0.76 


Tyr/Tyr 


+0.36 -0.38 




-0.64 


ATP/tRNA 


-0.02 -2.23 


-2.1S 


-1.07 


N48G + T51P 








+0.37 -0.94 






Tyr/Tyr 








ATP/tRNA 


+ {.16 -1.05 


+0l2l 


+0.90 






















ATP/tRNA 






-2.23 


H48Q + T51P 






















ATP/tRNA 


Subtilisin BPN' 

E156Q + ciwry 






















K 


+2.15 +0.53 
E156S + G166D 




W26 




















E156Q + G166N 








-1.71 -0.11 


-1.82 


-069 




-1.04 +0.14 


-0.9O 






-0.45 +0.18 


-0.27 






+2.15 +0.48 
E156S + G166N 


+2.73 




E 


-1.44 -0.11 


-1.55 


-0.51 


Q 


-0.59 +0.14 


-0.45 


-0.85 


M 


-0,85 +0.18 


-0.67 


-0.78 


K 


+1.68 +0.48 
EI56S + 0166K 


+2.16 


+1.26 


E 


-1.44 -3.49 


-4.93 


-4.49 


0 


-0.59 -1.03 


-1.62 


-0.95 


M 


-0.85 -1.37 


-2.22 


-1.12 


K 


+ 1.68 +0.51 
E156Q + GI66K 


+2.19 


+ 1.88 


E 


-1.71 -3.49 


-5.20 


-4.49 




-1.04 -1.03 


-2.07 


-0.95 


M 


-0.45 -1.37 


-1.82 


-1.12 


K 


+2.15 +0.51 


+2.66 


+ 1.88 



In these six examples there are large and systematic discrep- 
ancies between the sum of the AAG£ values for the single 
mutants and those of the corresponding double mutant (Wells 
et aL, 1987b). In almost all cases, the sum of the AAG? values 
for the single mutants is much greater than the value for the 
multiple mutant Nonetheless, the AAGf value predicted from 
the sum of the single mutants does have the same sign as that 
for the double mutant, so that the single mutants predict 
qualitatively the effect on the multiple mutant. 

A plot (Figure 2) of the collective data set from Table II 
is in contrast to that seen in Figure 1. The AAGJ values for 
the multiple mutants correlate more poorly with the sum of 




£MGj components 

FIGURE 2: Data are taken from Table n for mutants of subtilisin (■) 
or tyrosyt-tRNA synthetase (O) where mutant or wad-type side chains 
can contact each other. The dashed line represents a theoretical line 
of unity slope, and the solid line represents the best fit. 

the component single mutants (R* = 0.72). Moreover, the 
slope of the line (0.61) is much below unity. This indicates 
that the function of one residue is compromised by mutation 
of another. Of the 40 additivity examples, the average con- 
tribution of the most dominant single mutant to the sum of 
the AAGf values is 71% (±13%) of the total. Thus (as in 
Figure 1), both single mutants can contribute substantially 
to free energy changes measured in the multiple mutant. 
However, this data set is derived from mutations at only two 
different sites on two different proteins. 

In summary, complex additivity can be observed when 
mutations at sites X and Y change the intramolecular inter- 
action energy between sites. This can be mediated by direct 
steric, electrostatic, hydrogen-bonding, or hydrophobic in- 
teractions or indirectly through large structural changes in the 
protein, solvent shell, or electrostatic interactions. Complex 
additivity is most likely to occur where the sites of mutation 
are very close together and larger or chemically divergent side 



(5) Mutations at Sites X or Y Change the Enzyme 
Mechanism or Rate-Limiting Step. If the catalytic functions 
of two or more residues are interdependent, then a mutation 
of one residue can affect the functioning of the other(s). This 
form of complex additivity is well illustrated for mutations in 
the catalytic triad and oxyanion binding site of subtilisin 
(Carter & Wells, 1988, 1990). In the catalytic mechanism 
of subtilisin (Figure 3), the rate-limiting sup in amide bond 
hydrolysis is transfer of the proton from Ser221 to His64 with 
nucleophilic attack upon the scissile carbonyl carbon. This 
is accompanied by electrostatic stabilization of the protonated 
imidazole by Asp32 and hydrogen bonding to the oxyanion 
by the side chain of AsnlSS and the main-chain amide of 
Ser221. Mutational analysis shows that once the catalytic 
Ser221 is mutated to Ala (S221A), additional mutations in 
the triad or oxyanion binding site cause no further loss in 
catalytic efficiency (Table III). 

The S221A enzyme retains a catalytic activity that is still 
1 0* above the solution hydrolysis rate (Carter & Wells, 1988). 
It is proposed that this residual activity is derived from re- 
maining transition-state binding contacts outside of the cat- 
alytic triad coupled with solvent attack upon the carbonyl 
carbon from the face opposite position 221 (Carter & Wells, 
1990). This proposal is based on a model showing that there 
is no room for a water molecule near Ala22 1 once the substrate 
is bound. Furthermore, conversion of Asnl55 to Gly enhances 
the activity of the S221A mutant by -1.2 kcal/mol fTable III). 



Perspectives in Biochemistry 



Biochemistry, Vol. 29. No. 37. 1990 8513 




with permission from Carter *nd Wen* (1988). Copyright 1988 \ 

Table HI: Comparison of SumJ of AAG/ from Compooenl Mutants 
vj the AACj' for Multiple Mutants in the Catalytic Triad and 
Oxyaoion Binding Site of Subuliaa BPN" 



«tlwratt^mjtij«acylajk^ Reproduced 



S221A + H64A* 






+8.93 +8.84 


+ 17.76 


+8.83 


S221A + D32A 






+8.93 +6.52 


+ 15.45 


+8.86 


H64A + D32A 






+8.84 +6.52 


+15.36 


+7.48 


S221A + H64A + D32A 






+8.93 +8.84 +6.52 


+24.29 


+8.6S 


S22!A + ™£ 






+8.93 +7.48 


+16.40 


+8.65 


HMA + 5221 A - 






+ r>32A 






+8.84 +8.8< 


+ 17.70 


+8.65 








+6.52 +8.83 


+ISJ5 


+8.65 


S22IA + N155C 






+8.93 +3.08 


+ 1Z01 


+7.70 



rayed with the substrate soccroyt-L-Ala-L-A 
L-Pro-L-Phe-/HnitroaniIide. 'Carter aod Wells (1988). 'Carter and 
Wdb (1990). 

This is consistent with the opposite- face solvent attack 
mechanism of S221A, because the oxyanion (Figure 3) would 
develop away from AsnlSS and the NtSSG mutation improves 
solvent accessibility to the tcissile carbonyl carbon. 

Complex additivity is also seen for subtflisin mutated at 
positions 64 and 32. The double (H64AJ332A) and corre- 
sponding single mutants show a linear dependence upon hy- 
droxide ion concentration (between pH 8 and 10) that may 
reflect hydroxide assistance in the deprotonation of the Oy 
of Ser221 (Carter 4. Wells, 1988). Thus, once His64 is 
converted to Ala, Asp32 is a liability, presumably by elec- 
trostatic repulsion of hydroxide ion. [Note the -1 3 kcal/mol 
improvement in AAG-f for the double mutant (H64A.D32A) 
compared to H54A atone; Table HI.] 

In summary, if an enzyme mechanism relies upon cooper- 
ative interaction between two or more residues, then multiple 
mutations within this subset can result in large values for 
AGfd). In fact, if the mechanism is changed substantially, 
residues that were a catalytic asset can become a liability. 
Simple additivity can also break down when one or more of 
the mutations cause a change in the rate-limiting step. In an 
te may have a number of mutants in an enzyme 
* the activity, but the cumulative enhancement of 



activity could not go beyond the o^usion-controlkd limit 
(Albery & Knowles, 1976). 

Additive Effects on Substrate Binding 

The analysis above considered changes in binding free en- 
ergies between the free enzyme and substrate (E + S) to yield 
the bound trawrition-atate complex (E-S*). The steady-state 
kinetic analysis for subtflisin and tyrosyMRNA synthetase is 
such that the £m values approximate the enzyme-substrate 
dissociation constant Additivity analysis based on calcu- 
lations of AAGtMfac (from K u values) or AAG M (from 
values) yields qualitatively the same results (not shown) as 
shown in Tables I and II and Figures 1 and 2. Thus, deviations 
from simple additivity are not systematically found in either 
the energetics to form the E-S complex or those to reach E-S*. 

Additive Effects on Protein-Protein Interactions 

The first clear examples of additive binding effects caused 
by amino acid replacements in proteins were reported by 
Laskowski et aL (1983) and reviewed by others (Ackers ft 
Smith, 1985; Horovkz ft Rigbi, 1985). One hundred natural 
variants of a proteinase inhibitor, the ovomucoid third domain, 
have been isolated and sequenced from the eggs of different 
bird species (Em pie ft Laskowski, 1982; Laskowski et at, 
1987). This is a nested set of proteins because for any one 
of these avian inhibitors there is a dose relative containing only 
one or a few amino acid substitutions. Moreover, the asso- 
ciation constants (A",) of these inhibitors with a variety of serine 
proteinases vary over an enormous range (lO'-foki). Laskowski 
et aL (1983, 1989) have shown that the effect of a given residue 
replacement on A, is about the same irrespective of the in- 
hibitor scaffold the replacement is made in. 

In addition to ovomucoid, four additivity examples have been 
constructed from natural variants at the snburrit interface of 
tetrameric hemoglobin (Ackers ft Smith, 1985). Three ad- 
ditivity examples have been analyzed for interactions of hGH 
with its receptor (B. C Cunningham and J. A. Wells, un- 
published results) and one example for association of synthetic 
variants of the RNase S peptide with RNase S protein 
(Mitchinson St Baldwin, 1986). The entirety of this data set 
is not tabulated because much on the ovomucoid inhibitors 
and hGH is unpublished. Nonetheless, these researchers woe 
kind enough to provide their data formatted so it could be 
plotted collectively in Figure 4. These data consist of 91 
additivity examples (80 in ororrmcoids alone), representiog 22 
multiple mutants across four different proteins, and span a 
wide range of change in binding free energy (-10 to +7 



8514 Biochemistry, Vol. 29, Au. J7, 1990 



Perspectives in Biochemistry 




^ AAG C **, components 
figure 4: Plot showing the sum of changes in free energies of binding 
at protein-protein interfaces for component mutants versus the 
corresponding multiple mutant Data represent interactions between 
ovomucoid third domain and various serine proteases (□) (R. Wynn 
and M. Laskowski, personal communication), regulatory interface 



of a-fri hemoglobin (•) (Ackers & Smith, 1985), hOH and its receptor 
(stippled A) (B. Cunningham and J. WeDs, personal communication), 
and RNase S peptide and S protein (■) (Mitchinson & Baldwin, 



kcal/mol). The plot shows a very strong linear correlation (R 1 
■ 0.96) with a slope near unity. Although the data for the 
ovomucoid were not sorted to evaluate changes at intramo- 
lecular contact sites, most are not expected to be in contact, 
and all of the other examples represent noncontact sites. Thus, 
the large data base derived from natural variants of ovomucoid 
third domain, as well as a smaller number of examples from 
several other proteins, indicates that multiple mutations at 
protein-protein interfaces commonly produce simple additive 
effects. 

Additive Effects in DNA-Protein Interactions 

One of the clear advantages in analyzing DNA-protein 
interactions is the ability to apply powerful selections that make 
analysis by random mutational studies feasible. Additivity 
in DNA-protein interactions was first demonstrated by re- 
version analysis of X repressor (Nelson & Sauer, 1985). A 
mutation that decreased the binding affinity for the X operator 
site (K4Q) was reverted by mutations at several second sites 
(E34K, G48S, and E83K). When these second-site revertants 
were introduced into wild-type X repressor, they caused in- 
creases in affinity similar to those observed in the first-site 
suppressor mutant (K4Q). 

Functional independence for mutations at DNA-protein 
contacts has been demonstrated by additive effects for mutants 
of CAP (catabolite gene activator protein) and its operator 
sequence (Ebright et al., 1987) as well as lac repressor and 
its corresponding operator sequence (Ebright, 1986). Simple 
additivity of mutational effects in the operator sequences for 
Cro repressor (Takeda et al H 1989) and X repressor (Sarai & 
Takeda, 1989) has been most systematically demonstrated. 
Simple additivity has also been reported for multiple mutations 
in the lac repressor (Lehming et al., 1990). In fact, simple 
additivity is so predictable in DNA-protein interactions that 
the observation of complex additivity has been used to predict 
specific DNA-protein contacts in the lac rcpressor-operator 
complex (Ebright, 1986). 

Additive E ff ects on Protein Stability 

The first systematic analysis of additive effects of site- 
specific mutations on protein stability was reported by Shortle 
and Meeker (1986). Five multiple mutants in staphylococcal 





Staphylococcal Nuclei 
V66L + G79S* 
















+02 -29 


-27 


-36 




V66L + G88V 






GuHCl 


-0.2 -1.0 


-1.2 


-2.1 




+0.2 -0.9 


-0.7 






II8M + A69T 






GuHCl 


-0.6 -2.7 


-3.3 


-2.8 




-0.7 -2.9 


-3.6 


-3.8 




I18M + A90S 






GuHCl 


-0.6 -1.4 


-2,0 


-2.2 




-0.7 -1.4 


-2.1 


-2.2 




V66L + G79S + G88V 






GuHCl 


-0.2 -2.6 -1.0 


-3.8 


-3.0 




+0.2 -2.9 -0.9 


-3.6 


-3.4 




N-Terminal Domain of X Rt 


•pressor 






G46A + G48A* 






thermal melt 


+0.7 +0.9 


+1.6 


+ 1.1 




T4 Lyiozymc 








I3C + C54V 






therm*! melt 


+ 1.2 -0.7 


+0.5 


+0.4 




13C + CS4T 






thermal mdt 


+ 1.2 +03 


+ 1.5 


+ 1 5 




13C + C54T + R96H 






thermal melt 


+ 1.2 +0.3 -2.8 


-1.3 


-Z5 




13C.C54T + R96H 






thermal melt 


+ 1.5 -2.8 


-1.3 


-2.5 




13C + C54T + AI46T 






thermal melt 


+ 1.2 +0.3 -1.5 


0 


-0.5 




I3C.C54T + A146T 






thermal melt 


+ 1.5 -1.5 


0 


-0.5 




Bacteriophage fl Gene V 






V35I + 147V' 






GuHCl 


-0.4 -2.4 


-2.8 


-2.9 




Kringle-2 of tPA 








H64Y + R68C 






thermal mdt 


+2.9 +0.7 


+3.6 


+3.4 




Turkey Ovomucoid Third 1 


tomain 






G32A + mtsr 






thermal melt 


+0.8 -as 


+0J 


+0.2 




Y20H + N45-CHO 






thermal melt 


-0.8 +0.3 


-0.5 


-0.6 




a Subunit of E. coll Trp Synthetase 






YI75C + G2UE' 






GuHCl 


-0.1 +0.3 


+0.2 


-U 



•Shortle and Meeker (1986). »Hecht et al. (1986). 'Wetzel et al. 
(1988). 'Sandberg and TerwUliger (1989). 'R. Kelley, personal 
communication. 'Otlew t ki and Laskowski (1990). N45-CHO refers 
to a glycosylate of Ain45. 'Hurlc et al. (1986). 

nuclease were constructed from a group of random single 
mutants that were screened initially for their ability to affect 
the stability of the enzyme in vivo. The component mutants 
do not make direct contact with each other in the multiple 
mutants. Generally, these variants exhibit nearly additive 
effects except for the double mutant V66L.G88V (Table IV). 
In addition to those of staphylococcal nuclease, additive effects 
on the AAGnfciag (assayed by reversible denaturation) have 
also been determined for tie N-terminal domain of X repressor 
(one example; Hecht et al., 1986), the ee-subunit of E. coll Trp 
synthetase (one example; Hurle et aL, 1986), T4 lysozyme (sue 
examples; Wetzel et al., 1988), the gene V product of bac- 
teriophage fl (one example; Sandberg & Terwilliger, 1989), 



Perspectives in Biochemistry 



Biochemistry, Vol. 29, No. 37, 1990 8515 











-MS 



-4-2 0 2 4 

'Z&AG irl a rv components 



figure 5: Plot ihowing turn of changes in free energy of unfolding 
of component mutants and resulting multiple mutant Data are taken 
from Table IV and represent staphylococcal nuclease {■), N-terminal 
domain of X repressor (0), T4 lysozyme (O), bacteriophage f 1 gene 
V product (•), Kringle-2 domain of tissue plasminogen activator (A), 
turkey ovomucoid third domain (a), and the a-subuhit of Trp 
synthetase (V). The dashed line represents a theoretical line of unity 
slope, and the solid line represents the best fit. 

natural variants of ovomucoid third domain (two examples; 
Otlewsld & Laskowski, 1990), and the Kringle-2 domain of 
human tissue plasminogen activator (t-PA) (one example; R 
Kelley, personal communication). 

Collectively, this data set gives a high linear correlation (i? 
« 0.94) and slope near unity (Figure 5). The generally simple 
additive behavior is somewhat surprising given the highly 
cooperative nature of protein folding. There are discrepancies 
in some of the additivity examples besides the staphylococcal 
nuclease mutant (V66L.G88V). For example, the 1.5 
kcal/mol discrepancy for the Y175C.G271E double mutant 
in Trp synthetase (Table IV) is proposed to result from the 
fact that these residues are in direct contact (Hurle et al„ 
1986). Furthermore, proximity effects may account for the 
large differences between the sum of the component mutants 
and the multiple mutants for the o>helical double glycine 
mutant G46A.G48A in X repressor (FJecht et al., 1986), and 
when combining R96H with the C3-C97 disulfide mutant in 
T4 lysozyme (Wetzel et al., 1988). In contrast, an exchange 
of two side chains that contact one another (V35I and I47V) 
in the hydrophobic cqre of the gene V product of fl phage 
produced simple additive effects (Sandberg & Terwilliger, 
1989; Table IV). It should be noted that this data base ex- 
hibiting simple additivity may be biased for single mutants 
that stably fold, because severely unstable proteins are more 
difficult to express. 

By analogy to transition-state binding effects, one can 
certainly imagine instances where the stabilizing effects of 
mutations should reach a plateau. For example, denaturation 
at high temperatures can become controlled by a chemical step 
such as deamidation (Ahem et al., 1987), so that additional 
mutants that stabilize the folded form of the protein may be 
irrelevant Another obvious example where complex additivity 
can be observed in protein stability is the stabilizing effect of 
disulfide bonds and noncovaJent intramolecular contacts that 
require interactions between two or more residues. In these 
cases, the stabilizing interaction between two side chains can 
be broken with only one mutation. 

Applications of Additivity in Rational Protein 
Desion 

A strategy of additive mutagenesis, where a series of single 
mutants each making a small improvement in function are 



combined, is one of the most powerful tools in designing 
functional properties in proteins. This approach has been 
remarkably successful in stabilizing proteins to irreversible 
inactivation, such as X repressor (Hecht et aL, 1986), subtilisin 
(Bryan et al., 1987; Curmingham Sc. Wells, 1987; Pantoliano 
et al., 1989), kanamycin nucleotidyltransferase (Liao et al., 
1986; Matsumura, 1986), neutral protease (Imanaka et al., 

1986) , and T4 lysozyme (Wetzel et al., 1988; Matsumara et 
aL, 1989). This strategy has been applied to enhancing the 
catalytic efficiency of a weakly active variant of subtilisin 
(Carter et al., 1989), engineering the substrate specificity of 
subtilisin (Wells et al., 1987a,b; Russell & Fersht, 1987) and 
the coenzyme specificity of glutathione reductase (Scrutton 
et al., 1990), designing protease inhibitors with exquisite 
protease specificity (Laskowski et al., 1989), and recruiting 
human prolactin to bind to the hGH receptor (Cunningham 
et al., 1990). In addition, additivity principles have been used 
to engineer the pH profile of subtilisin (Russell & Fersht, 

1987) and to design the affinity and specificity of X repressor 
(Nelson & Sauer, 1985). 

For this approach to work does not require that all the 
component mutants art in a simply additive manner but just 
that their effects accomukte. For example, despite the com- 
plex additivity of effects in the catalytic triad of subtilisin, there 
are mutagenic pathways that are energetically cumulative for 
installing the triad (Carter & Wells, 1988; Wells et aL, 1987c). 
Starting with the triple mutant S221 AJW4A,D32A, there is 
a progressive enhancement for installing Ser221 (-1.1 kcal/ 
mol), then His64 (-1.0 kcal/mol), and finally Asp32 H>.5 
kcal/mol). Another cumulative pathway of Ser221, then 
Asp32, and finally His64 is possible if the Ser221,Asp32 in- 
termediate were to use HisP2 substrates (Carter & Wells, 
1987). Elaborating such cumulative pathways is important 
for understanding how a catalytic apparatus may have evolved 
and is practically useful for considering how to install such 
catalytic machinery into weakly active catalytic antibodies. 

Conclusions 

In the majority of cases, combination of mutations that 
affect substrate or transition-state binding, protein«-protein 
interactions, DNA-protein recognition, or protein stability 
exhibits simple additivity. Simple additivity is c ommo nly 
observed for distant mutations at rigid molecular interfaces 
such as in p rotein-protein and DNA-protcm ^ interactions, 
where the mutations arejinhkcly 'to alter grossly -the struct ure 
or mode of binding. 

Large deviations from simple additivity can occur when the 
sites of mutations strongly interact with one another (by 
making direct contact or indirectly through electrostatic in- 
teractions or large structural perturbations) and/or when both 
sites function cooperatively (as for the catalytic triad and 
oxyanion binding site of subtilisin). Changes at sites that can 
contact each other do not always lead to complex additivity; 
this may reflect relatively weak interactions between the two 
sites or indicate that the interactions are compensatory and 
appear to be weak. 

It is important to point out the magnitude of errors in 
predicting the free energy effect in the rn t ultiplfi^utarjLfrpm 
the component single mutants. Generally, for those cases 
exhibiting simple additivity (Figures 1, 4, and 5), tie dis- 
crepancy in free energy between the sums of the components 
and multiple mutants is about ±25%. Part of this is the result 
of compounding errors when summing the single mutants, and 
the rest is presumably due to weak interaction terms. 
Nonetheless, this means that if the total free energy change 
is about 3 kcal/mol, the change in the equilibrium constant 



8516 Biochemistry, Vol. 29, No. Jr . 1990 



Perspectives in Biochemistry 



(related by » 10- J /«" » 155) will often be off by a 

factor of 4. Thus, while the free energy effects accumulate, 
significant deviations will occur in predicting the final equi- 
librium constants when component mutants contribute a large 
free energy term. 

Simple additivity reflects the modularity of component 
amino acids in protein function. This results from the fact 
that the perturbations in energetics and structure resulting 
from most mutations are highly localized. In the past six years, 
an additive mutagenesis strategy has been extremely effective 
in engineering proteins — of course, nature has been using this 
strategy much longer. 

Acknowledgments 

I am grateful to Dr. Michael Laskowski and Rich Wynn 
for providing their data prior to publication on ovomucoid third 
domain and similarly to Brian Cunningham, Paul Carter, and 
Robert Kelley for making available their unpublished data. 
I am indebted for useful discussions with Drs. Michael Las- 
kowski, Paul Carter, Jack Kirsch, and Tony Kossiakoff and 
many of my colleagues at Genentech and to Drs. Richard 
Ebright and William Jencks and those above for critical 
reading of the manuscript. 

Registry No. RNase, 9001-99-4; tyrosyl-tRNA synthetase, 
9023-45-4; trypsin, 9002-07-7; dihydrofolate reductase, 9002-03-3; 
lublilUin BPN', 9014-01-1; glutathione reductase. 9001-48-3; 
staphylococcal nuclease, 9013-53-0; lysozyme, 9001-63-2; plasminogen 
activator, 105913-1 1-9; tryptophan synthetase, 9014-52-2. 

References 

Ackers, G. K., & Smith, F. R. (1985) Annu. Rev. Biochem. 
54, 597-629. 

Ahem, T. J., Casal, J. I., Petsko, G. A., & Klibanov, A. M. 

(1987) Proc. Nail. Acad. Sei. USA. 84, 675-679. 
Alber, T., Dao-pin, S., Wilson, K., Wozniak, J. A., Cook, S. 

P.. & Matthews, B. W. (1987) Nature 330, 41-46. 
Albery, W. J., & Knowles, J. R. (1976) Biochemistry 15, 

5631-5640. 

Ardelt, W., & Laskowski, M., Jr. (1990) J. Mot. Biol, (sub- 
mitted for publication). 

Bott, R., Ultsch, M.. Wells, J., Powers, D., Burdick, D., 
Struble, M„ Burnier, J., Estell, D., Miller, J., Graycar, T., 
Adams, R., & Power, S. (1987) ACS Symposium Series 
334 (LeBaron, H. M., Mumma, R. O., Honeycutt, R. C, 
& Deusing, J. H., Eds.) pp 139-147, American Chemical 
Society, Washington, DC. 

Brown, K. A., Brick, P., & Blow, D. M. (1987) Nature 326, 
416-418. 

Bryan, P. N., Rollence, M.-L., Pantoliano, M. W., Wood, J., 

Finzel, B. C, Gilliland, G. L., Howard, A. J., & Poulos, 

T. L. (1987) Proteins: Struct., Fund., Genet. 1, 326-334. 
Carter, P., & Wells, J. A. (1987) Science 237, 394. 
Carter, P., & Wells, J. A. (1988) Nature 332, 564-568. 
Carter, P., & Wells, J. A. (1990) Proteins: Struct., Fund., 

Genet, (in press). 
Carter, P., Nilsson, B., Burnier, J. P., Burdick, D., & Wells, 

J. A. (1989) Proteins: Struct., Funct., Genet. 6, 240-248. 
Carter, P. J., Winter, G., Wilkinson, A. J., & Fersht, A. R. 

(1984) Cell 38, 835-840. 
Chothia, C, & Lesk, A. (1986) EMBO J. 5, 823-826. 
Craik, C. S., Largman, C, Fletcher, T., Roczniak, S., Barr, 

P. J., Fletterick, R., & Rutter, W. J. (1985) Science 228, 

291-297. 

Cunningham, B. C, & Wells, J. A. (1987) Protein Eng. 1, 
319-325. 



Cunningham, B. C, Henner, D. J., & Wells, J. A. (1990) 

Science 247, 1461-1465. 
Ebright, R. H. (1986) Proc. Natl. Acad. Sci. UJS.A. 83, 

303-307. 

Ebright, R. H., Kolb, A., Buc, H., Kunkel, T. A., Krakow, 
J. S., & Beckwith, J. (1987) Proc. Natl. Acad. sci. US.A. 
84, 6083-6087. 

Empie, M. W„ & Laskowski, M., Jr. (1982) Biochemistry 21, 
2274-2284. 

Fersht, A. (1985) in Enzyme Structure and Mechanism, 2nd 
ed., Chapters 3, 12, and 13, W. H. Freeman and Co., New 
York. 

Fersht, A. R. (1987) Biochemistry 26, 8031-8037. 
Fersht, A. R., Wilkinson, A. J., Carter, P., & Winter, G. 

(1985) Biochemistry 24, 5858-5861. 

Hecht, M. H., Sturtevant, J. M., & Sauer, R. T. (1986) 
Proteins: Struct., Funct., Genet. 1 43-46. 

Horovitz, A., & Rigbi, M. (1985) J. Theor. Biol. 116, 
149-159. 

Howell, E. E., Villafranca, J. E„ Warren, M. S.. Oatley, S. 

J., & Kraut, J. (1986) Science 231, 1123-1128. 
Hurle, M. R., Tweedy, N. B., & Matthews. C. R. (1986) 

Biochemistry 25, 6356-6360. 
Imanaka, T.. Shibazaki, M, & Takagi, M. (1986) Nature 324, 

695-697. 

Jencks, W. P. (1981) Proc. Natl. Acad. Set. USA. 78, 
4046-4050. 

Jones, M. D., Lowe, D. M., Borgford, T., & Fersht. A. R. 

(1986) Biochemistry 25, 1887-1891. 

Katz, B. A., & Kossiakoff. A. (1986) /. Biol. Chem. 261, 
15480-15485. 

Laskowski, M., Jr., Tashiro, M., Empie, M. W., Park, S. J., 
Kato, 3., Ardelt, W„ & Mieczorek, M. (1983) in Protease 
Inhibitory. Medical and Biological Aspects (Katunuma, 
N., Ed.) pp 55-68, Japan Scientific Societies Press, Tokyo, 
Japan. 

Laskowski, M., Jr., Kato, I., Ardelt, W., Cook, J., Denton, 
A., Empie, M. W,, Kohr, W. J., Park, S. J., Parks, K., 
Schatzley, B. L., Schoenberger, O. L., Tashiro, M-, Vichot, 
G., Wbatley, H. E., Wieczorek, A., & Wieczorek, M. 

(1987) Biochemistry 26, 202-221. 

Laskowski, M., Jr., Park, S. J., Tashiro, M., & Wynn, R. 
(1989) in Protein Recognition of Immobilized Ligands, 
UCLA Symposia on Molecular and Cellular Biology 
(Hutchens. T. W., Ed.) Vol. 80, pp 149-160, A. R. Liss, 
New York. 

Leatherbarrow, R. J„ Fersht, A. R., & Winter, G. (1985) 
Proc. Natl. Acad. Sci. USA. 82, 7840-7844. 

Lehming, N., Sartorius, J., Kisters-Woike, B., von Wilcken- 
Bergmann, B., & Muller-Hill, B. (1990) EMBO J. 9, 
615-621. 

Liao, H., McKenzie, T., & Hageman, R. (1985) Proc. Natl. 

Acad. Sci. US.A. 83, 576-580. 
Lowe, D. M., Fersht, A. R., Wilkinson, A. J., Carter, P., & 

Winter, G. (1985) Biochemistry 24, 5106-5109. 
Matsumura, M., Yasumura, S., & Aiba, S. (1986) Nature 

323, 356-358. 

Matsumura, M., Signor, G., & Matthews, B. W. (1989) 

Nature 342, 291-294. 
Matthews, B. W. (1987) Biochemistry 26, 6885-6888. 
Mayer, R. J., Chen, J.-T., Taira, K., Fierke, C. A., & Ben- 

kovic, S. J. (1986) Proc. Natl. Acad. Set. USA. 83, 

7718-7720. 

Mitchinson, C, & Baldwin, R. L. (1986) Proteinr. Struct., 
Funct., Genet. I, 23-33. 



Biochemistry 1990, 29, 8517-8521 



8517 



Nelson, H. C. M., & Sauer, R. T. (1985) Cell 42, 549-558. 
Otlewski, J., & Laskowsld, M„ Jr. (1990) (submitted for 
publication). 

Pantoliano, M. W., Whitlow, M., Wood, J. F., Dodd, S. W., 
Hardman, K. D., Rollence, M. L., & Bryan, P. N. (1989) 
Biochemistry 28, 7205-7213. 

Russell, A. J., & Fersht, A. R. (1987) Nature 328, 496-500. 

Sandberg, W. S.. & Terwilliger. T. C. (1989) Science 245, 
54-57. 

Sarai. A- & Takeda, Y. (1989) Proc. Natl. Acad. Sci. USA. 

86, 6513-6517. 
Schechter, I., & Berger, A. (1967) Biochem. Biophys. Res. 

Commun. 27, 157-162. 
Scrutton, N. S., Berry, A., & Perham, R. N. (1990) Nature 

343, 38-43. 

Shortle, D., & Meeker, A. K. (1986) Proteins: Struct., Funct^ 

Genet. I, 81-89. 
Takeda, Y., Sarai, A., & Rivera, V. M. (1989) Proc. Natl. 

Acad. Sci. USA. 86, 439-443. 



Wells, J. A., & Estell, D. A. (1988) Trends Biochem. Sci. 13, 
291-297. 

Wells, J. A., Cunningham, B. C, Graycar, T. P., & Estell, 
D. A. (1987a) Proc. Natl. Acad. Sci. USA. 84, 5167-5171. 

Wells, J. A., Powers, D. B., Bott, R. R., Graycar, T. P., & 
Estell, D. A. (1987b) Proc. Natl. Acad. Sci. US.A. 84, 
1219-1223. 

Wells, J. A., Cunningham, B. C, Graycar, T. P., Estell, D. 

A., & Carter, P. (1987c) Cold Spring Harbor Symp. Quant. 

Biol. 52, 647-652. 
Wetzel, R., Perry, L X, Baase, W. A, & Becktel, W. J. (1988) 

Proc. Natl. Acad. Sci. USA. 875, 401-405. 
Wilde. J. A., Bolton, P. H., Dell'Acqua, M., Hibler, D. W., 

Pourmotabbed, T., & Gerlt, J. A. (1988) Biochemistry 27, 

4127-4132. 

Wilkinson, A. J., Fersht, A. R., Blow, D. M., & Winter, G. 

(1983) Biochemistry 22, 3581-3586. 
Wilkinson, A. J., Fersht, A. R., Blow, D. M., Carter, P., & 

Winter, G. (1984) Nature 307, 187-188. 



Accelerated Publications 



Role of Tyrosine M210 in the Initial Charge Separation of Reaction Centers of 
Rhodobacter sphaeroides^ 

Ulrich Finkele, Christoph Lauterwasser, and Wolfgang Zinth 
Phystk Department dtr Technischen UniversitSt. D-8000 MQnchen 2. FRG 
Kevin A. Gray and Dieter Ocsterhelt* 
Max-Planck-lnstitut fur Biochemie, D-8033 Martinsrted, FRG 
Received May 22, 1990; Revised Manuscript Received July 12, 1990 



abstract: Femtosecond spectroscopy was used in combination with site-directed mutagenesis to study the 
influence of tyrosine M210 (YM210) on the primary electron transfer in the reaction center of Rhodobacter 
sphaeroides. The exchange of YM210 to phenylalanine caused the time constant of primary electron transfer 
to increase from 3.5 ± 0.4 ps to 16 ± 6 ps while the exchange to leucine increased the time constant even 
more to 22 ± 8 ps. The results suggest that tyrosine M210 is important for the fast rate of the primary 
electron transfer. 



The primary photochemical event during photosynthesis of 
bacteriochlorophyll- (Bchl-) containing organisms is a light- 
induced charge separation within a transmembrane protein 
complex called the reaction center (RC). The crystal struc- 
tures of RC's from Rhodopseudomonas (Rps.) oiridis and 
Rhodobacter (Rb.) sphaeroides have been solved to high 
resolution [reviewed in Deisenhofer and Michel (1989), Chang 
et al. ( 1 986), Tiede et al. ( 1 988), and Rees et al. ( 1 989)]. The 
RC from Rb. sphaeroides contains three protein subunits 
referred to as L, M, and H, according to their respective 
mobilities in SDS-polyacrylamide gels. Associated with the 
L and M subunits are the cofactors, consisting of four Bchl 
a, two bacteriopheophytin (Bph) a, one atom of non-heme 
ferrous iron, two quinones (Qa and Qb), and in some species 
one carotenoid [reviewed in Parson (1987) and Feher et al. 

* finincial tupport *a from the Deutsche Forschungjgemeinschaft, 
SFB 143. 

•To whom correspondence should be addrajed. 



(1989)]. The cofactors are arranged in two branches (Figure 
1 ) with an approximate Cj axis of symmetry. The kinetic data 
support a model in which the primary electron transfer pro- 
ceeds after light absorption by the primary donor [a special 
pair of Bchl referred to as P; reviewed in Kirmaier and Holten 
(1987)]. The absorption of light generates the excited elec- 
tronic state P*, which has a lifetime of approximately 3 ps. 
An electron is transferred from P along only one branch (the 
so-called A-branch). It is generally accepted that after ap- 
proximately 3 ps the electron arrives at the Bph on the A-side 
(HJ and after 220 ps it reaches Q A . The role of the accessory 
Bchl located between P and H A (referred to as Ba) has not 
been definitely assigned. Recently, we have shown that at 
room temperature an additional kinetic (t - 0.9 ps) component 
is detectable (Holzapfd et al., 1989). The spectral properties 
and the kinetic constants lead to the conclusion that the 
corresponding intermediate is the radical pair P + B A " (Hol- 
zapfel et al., 1990). 
Additional intriguing points concerning the process of 



OOO6-2960/90/0429-8517S02.50/0 © 1990 American Chemical Society 




The Babel of Bioinformatics 



by the National Center 
' (NCBfc, and 
tags (BSTi). 



Teresa ICAttwood . tnfllro 

which ire psrfiil sequences of dona thai ire 

ringing from 27.4(2 to 1 12,278. The meih- often error prone, arc boused in public and 

rujor achievement, but the meaning of ods used to arrive it these numbers etch proprietary repositories. These number* will 

■ the mass of accumulated data is only involve different ipproximitlons and ex- snowball with the Guidon of father genome 

just beguwing to be unraveled At first right, trapoUtions. Nevertheless, it ii disturbing projects. By contrast, the number of unique 

(he task appears straightforward: locate the that the different analytical approaches protein structures is still lets than 2000. Of 

J "~K>e»- should yield such disparate results, course, we do not koow how many unique je- 

' qgences there arc; nevenheksa, it is clear thai k 



us us- a gene ii unclear, b U a h< 

ing known or modeWatvcd structures. Oiv- responding to an observable phenotype? 

emlhequanityordata,tbopttxwtL«Ta Aoutd Or is it a packet of genetic information sequences. 1w< 

be aulornatodM much as possible. _ that encodes a protein, or proteins? Or per- cs bxve emcrgec rtttern rocogniocn merh- 

Tbert^nfctRB^UjwtsosjrB^At- baps one thai encodes RHA? Must it be ods aim to detect similarity between se- 

" genom- translated? Are genea genes if lacy are not quenci* and structures arid infer reletoJ 

atmeth- expressed? As dcfuiitic^ viiy. bseviubry fuactjaos. Thus, rosy require some ctmrec- 

saofuhojotal.number.pf l^JoJnnsJBee}^ 



n the language of biological 




tsifonnitioo used 

signals In the sequence, content statistics, 
and similarity to known genea. In a recent 
teat of jfene detection toots on part of the 
Dratcphlta genome, (he majority of these 
-gene finders" identified 95* of Coding 
nucleotides, but intron/exon etructjiirce 
ante correctly predicted for only about 



401* ofgsnea.TT* different methods felted 
to find between 5% and 95% of genes, and 
incorrectly identified up to 55* (/). But 
probably (be most sobering evidence of the 
frailty of gem prediction methods b the 
urjcertslnty in the numbet of genet In the 
human genome, with current estimates 



CstotabktiflM ttuc*soWebei«ia Usmbreno 
















tut 




feMtft. 





















































(ion thit emerge (mo tits) b«gb to prwUs functional does, 
reisers rotas In csbtyn binding, nudeetfda binding, and 
is prevldlni different scaffolds, which can be decor 



(AJ or a told (I) In babnasi we on say 1Mb about 
l or structures together do the patterns of conserva- 
functlonsl dues. For exirnplt. the ibon motifs (C) may 



Id and the bttcttoft tUew* us to ratlo- 



VCH290 ZOOCfOIER200p . 



dEST AVAILABLE COPY 



p^rt^. m«h«fa deduce rtructgrt dimaty 
ftom teqacsec lite i 



dffereal snd should not b« contused. Their Sudicamp 



function prediction through pattern 



?NCf \S COMfASS g| 

rubwavs sod new rpedficrtitt jSWly. and that o 
TLKStten and •dapuOon- Hon" dirTer. «•*»« 



motmorrsaacs. wn«. ^-»-r-Tr 
March, it nay be iwl^^f"/ 31 ™ 1 ^ 



Tool* lor similariry .-, — 

components of U»e setruence ennotatorb ar- 
mory. Sequence liraHariry rsrograrrts may seek 
pairwue similarities in luge — — * -"™- 



torics or searen tor rOTerven pwcin. 

(amity dsUbases (*-J)-Gene family 

cs »Uow more specific functiooal diagnoses to 
bo made thanr* possible by pairwise seardt- 
in. Thsy ire basod on ihe principle rhat rcUl- 
ed sequences; cart be aligned to find regions 
(rradts) dot show Unto vsristiou. These mo- 
tifs usually reflect some vital structure! or 
rurxtiorai role free the figure), and they can 
bewedr -»■- -r_:i..:_~._ 



much turned up by the search a the Hue or- 
tholog or a paralog. Thi» difficulty is the 
source ofrtumetouJ annotation errors. 

Further complications result from the 
domain end** modular nature of many pro- 
leios. Modules are autonomous folding ' 
units thai often function as protein building 
blocks, forming multiple combinations of 
die same modole or mosaic* of different 
module*. They can confer a variety of runc- 
tiorr* en tfwlareat protein. If ths best hit in 
a driabase search i* a match to 1 single do- 
main or modufc, h is unlikely tto the func- 
tion annotation can be propsgaled from me 
parent irratein to the tsaety r~™»« 



note, rneawanw uaus^ •» ~™>" 
be able to predict them rsliabty? 

Structure prediction methods range 
from computationally intensive strategies 
that simulate the physical and chemical 
rorces involved in protein folding to.- 
knowledge-based approaches that use in- 
formation from structure database* to. 
build models. Vet the problem , of predict- 
ing protein structure remains unsolved: 
knowledge-based techniques typically pro- 
duce low-resolution models, and no cur- 
rent method yields reliable prediction* for 
remote bcmologe (12). For small proteins, 
ab initio methods generate models with 
substantial segments that resemble The cor- 



rect IOIO, OUl rcxaiu HWHW w»/-» » 

-100 residues. Today, knowledge-based 




aigtimnT apprp mom m*y «« ■mi™ . 
■ but cornmon evolutionary origin remains a 
byrjofVstJ uttij stwortod by oeho- evidence; 
the hyrwtfeerie may be correct or mistaken, 
butrho rimOirUy is a fact (ft 



too." in an attempt to IrOroduce rigor mfrfbe 
fleWandbernsrrerlectbioto^realrty.ir^ 
pendent ontologies such as fcs Oeno Outtto- 

gy(">< 



pie, U a prediction a "good" prcdlc— . 
it correctly reproducer all atomic post 
Hons, use topology (ecwerrviry of sec- 
ondary structures), the architecture (grass 
arrangement of Secondary structures), or 
merely lb* structural class (mainly ct, 
raainty 8. atcVT Where doe* a -reasonably 
good" prediction fall in this hierarchy, and 
what level of structural detail doe* a 
"tough near miss" (/*) reveal? Using such 




8EST AVAILABLE COPY 



_. k is helpful to define our . 

ttrmi preciiety and be honest about our 
achievements. Otherwise, we will eontinut 
to be baffled by paradoxical new prediction 
methods dot ykM >80% error rates. Gene 
identification, structure prediction, and 



rational tasks, but with the relentless accu- 
mulation of leouence data, impravemenu 
continue to be made in all areas. 

Nature (unctions by integration, and die 
e holistic view of complex 



_. _ it from ge- 

nomic data, we need to take tccounl ttf in- 
*i oh the regulation of gene expxes- 
>d signaling cas- 



rxveiing these networks and iheir interac- 
tions will be vital to our uaderrtanding of 
normal and pathologic call development, 



SCKNCa » v»/rir«a 
TICHIIOHTINC 

— ;■ sorTWAM 

Conqu ring by 
Dividing 

The average personal computer 
spends much less than half a day 
actually performing useful compu- 
tations. Many users, concerned about the 
vulnerability or expensive electronic 
components to Ihe constant 
cycling of the power on and 
off, leave their systems on 
continuously. It is staggering 



of the program for Macintosh and 
laru syrtcmi are planned. 
The benefit* of the Popular Power 



Popular tamer. Inc. 
. SanftanclKo.CA 



puter is used. The flexible nature of Popu- 
lar Powers design, provides access for 
bus incites, scientists, and anyone with 
massive computing projects to computing, 
power that is potentially far greater than . 
they would gain from a fixed piece of 
hardware. Personal computer users might 
be able to telect which com- 
mercial job to run through 
Popular Power Worker de- 
pending on the return offeree) 
by the originating contractor. 
A key to the success of the 
computing model is likely to 
be the price Popular Power de- 
mands tor acting as the inter- 



"discovery's 
heavily dependent on accurate functional 



wU need to deliver highly integrated^ Inter- 



several million PCs left run- 
ning unattended. One popular 

approach to tapping thia com- 
puting power is the Search .for Extra-Tw- 
restrial Intelligence (SBTl) project (/), 



r^a^HBtTiyconipany . 
feeing a new twist on this theme. Li 
SBTl. a company computer 
'of targe compoting probli 



pro sea suncn, ■» ,,7,, 7.T . r^L-m, < 

variety of computing problems to work ^~*^^^JJ^ ' 

on. Tt^incliade nonprofit projecu with «"*«»<*~«~> 
no financial incentive to the personal 




we must be in our fttoUng (and writing) if 
wu are to make sense of the complexities. 
Sequence atauctura-sunction bloinformat- 
Ics does not yet yield all the answers, but a 
future hotistis approach ibould help fuse 
todays gSmnxrings of knowledge into a 



T«CHSICHTINC 
SOFTWARE. 

Eyes on the Skies 




. urtitert. 

.IV KKrvdrntofLt-HtLV*-**. l»1 (WOO). 

u k. 1 s, m ..** s cw-oph. taw. ant a. xe 
uusfirsfl 

uv K.A.CUUMU. 



oof me Popular Pow- 
l Wbrkar runs only on VVindowi and Lin- 
ux systems and fat officially in pro-release 
form; Tha rwelfatlnary status of the toft- 
wars is readily apparent; numerous bugs, asm* be orbital apace above Earth con- 
fteonenl coshes, and doTicaltiei in inrtal- I tains an astonishing collection of 
ukoa oleum the program curnatry. If in- I man-made satellites. Traddng ail of . 
rtraad^thecompWW toU^ these objects is no small task. Liftoff isj a 
iato.persraBj»:rjomnailero^ NASA Web lite that provides several soft- 

in Popular Powers etraputmg model may wart tools to locate, track, and identify 
fuiddWag Vim ths pioUemscf the early ~ 
• release worth their whtti. Users of me pre- 



the offsets! 

Power Worker can be downloaded for free 
from the company* Web site, and it Installs 





i orbiting Earth 
from a perspective far away in (pace). 
Bach of these platform- independent appli- 
cations is written In lavs and is accessible 



SCIENCE VOL 290 20 OCTOBER 2000 . 



BEST AVAILABLE COPY 



From genes to protein structure and function: 
novel applications of computational 
approaches in the genomic era 

Jeffrey SkoWck and Jacquelyn S. Fetrow 

The eCTome-sequendng projects are providing a detailed 'parts list* of fife. A key to comprehending this list is understanding 
the function of each gene and each protein at various levels. Sequencebased methods for function prediction are inadequate 
because of the multifunctional nature of proteins. However, lust knowing the structure of the protein is also insufficient for 
prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets 
In both the sequence and structura^enormcs projects. 

Gcncmie-iequencing projects ire providing a 
detailed 'para bst' for fife. Unfortunately, this list, 
a portion of which represent* the amino acid 
sequence of all the proteins in a given genome, dots 
not come with an iratruction manual That is, given 
the genome* sequences, one does not neceasuily know 
straight away which regions encode proteins, which 
serve a regulatory role and which are responsible for 
the structure and replication of the DNA itself. 
Toil a not unHks giving a child a list of parts nec- The scqueoce-4r>-funcnon approach is the m 

a working automobile. Without the moruy used foncdan-p re diction method. This robust 
field is well developed and, in the interest of space 
limitations, we will merely present a brief overview. 
There are two main flavors of this approach: sequence 





Frosite">, Blocks", Prints'^ J and Emotif". Both the 
alignment and the motif methods are powerful but a 
What is a protein function? recent analysis has demonstrated their significant lnni- 

After a genome is sequenced and its complete parts otionj 11 , suggesting that these methods will increasingly 
list determined, the next goal ii to understand the tunc- tail as the protein-sequence databases become more 
tion(s) of each part, mcwdingthatof the proteins. What diverse. 

do we mean by protein function, the focus of this article? An extension of these approaches that combines 
" protein-sequence with structural information has been 



rein could be a globular protein, such as an enzyme, 
hormone or antibody, or it could be a structural or 

chemical function, such as the chemical reaction and 
the substrate specificity of an enzyme. The regulatory 

levels of biochemical function. 
" At the cellular level, the proteins function would 
involve its interacriou with other macromc' 
the function and cellular location of such 
There is also the proteins pbyaiolof 
is, in which metabolic pathway the 




function paradigm. Here, the goal is to determine the 
J. M.M p^te*"*"*"""* 1 **><*' D..5W. HmeSdma «n»«ure of the protein of interest and then to identify 
Cmw. IManutr of Cam,*Mlaul CnwmJo, 4041 ftmrf An* die functionally important residues in that structure. 
Aimc, Si Coi*i. MO «J 109, USA. J.S. r«» ii « CoaParmtiia, Using the chemical structure itself to identify functional 
Sum too, SUO OMSk On'vr, Smm ENcje, CA 93 121-J7S4, USA, sites is more in line with how the protein actually works. 



EXHIBIT 

1 0 



Id a it me, this is one long-term goal of 'structural 
genomics' projects" 1,1 ', which are designed to deter- 
mine all possible protein folds experimentally, just 
icing projects 
n . This is in a 



structural-biology approaches. In which one know* the 
proteins function first and only then, if the function is 

It is implicitly assumed that having the protein s struc- 
ture will provide insights into its function, thereby fur- 
thering the goals of the 1 




In order to use a structure-based approach to function 
prediction, one must identify the key residues respon- 
sible for a given biochemical activity. For many yean, 
it has been suggested that the active sites in proteins are 
better conserved than the overall fold. Taken to the 
re could not only identify dU- 
une global fold and the same 



distantly related, or possibly unrelated, global folds. 

The validity of this suggestion was demonstrated 
empirically by N Ustinov and co-workers, who showed 
"" a of eukaryotic serine pn 



■t modeling study of 
Saaluuonyas cmvuue proteins, protein funcdoual rites 
were found to be more conserved than other parts of 
the protein models*?. Similarly, it has been demon- 
strated that the catalytic triad of the a/p hydrolases 
is structurally better, conserved than other hrstidine- 
■ ■ - - - - B-ofthe 



A common protein characteristic that makes functional analysis based 
only on homology especially difficult is the tendency erf proteins to be 
multifunrfonal. For instance, lactate dehydrogenase binds NAD, sub- 
strata and zinc, and performs a redox reaction. Each of these occurs 
at different functional sites that are ki close proximity and the combi- 
nation of at four sites creates the hify functional protein. 

Other examples of multifunctional proteins are the nuctefc-acUbhtfing 
proteins. For instance, DMA regulatory proteins often contain a DNA- 
btmlmg domair, a mufiimerization domain and additional sites that bind 
regulatory proteins; a classic example is RecA<*. The 3C rrmovirus 
protease exhibits a proteolytic function as weS as an RNAbinding 
function* 0 -". Transcription factors are also complex, multifunctional 
protsinsa.lt is becoming ri creasing ry important to recognize each of 
tftase different functions of gene products of a newly sequenced gene. 

The ssrfnB-threoni^hosphatase superfamfty Is a prime example of 



phosphatase active site. Subfamilies 1, 2A and ZB exfdbtt 4 OX or more 
sequence identity between themO. However, each of these subfamftes 
Is apparently regulated differently In the cell 64 -" and observation sug- 
gests that there are different functional sites at which regulation can 
occur. Because the sequence identity between subfamSes is so high 
standard sequence-similarity methods could essay reclassify new 
sequences as members of the wrong subfamsy if the functional sites 
are not carefully considered, as was recently demonstrated". 

These are but a few examples of the muttifunctionaSty of protasis. 
The recognition of this multifunctional nature is of critical importance 
to the genomics field. Useful (uncbonaUmotatjon methods must con- 
sider all of the specific functions in a given protein and wil not just 
provide a general classification of function. 



several novel functional sites is known, high-quality 
protein structure* 1 -* 1 . More automated methods for 
finding spadal motifs in protein structures have also 
been described* >*-«. 

jr, most of these methods require the 
: of atoms within protein backbones and 




in proteins; these efforts 

laces where they did not previously exist. This 
" 'ished for several metal- 



binding rite* 1 " 1 . However, highly ai 
rite descriptors of the backbone and side-chain atoms were 
required, fueling the belief diatsifrnificant atomic detail 
is required in site descriptors for function identification. 

Highly detailed residue ride-chain descriptors of the 
active sites of serine protease* and related proteins have 
been used to identify functional rites 1 . The use of these 
highly detailed mot" ' " 



known to 

exhibit the disulfide- oocidoceductaie active site 1 *. In a 
larger datasct of 1501 proteins, the FTP sgain accurately 
identified all proteins with the active site. In addition, 
it identified another protein, lfjm, a serine— threonine 
phosphatase. This result was initially discoucqring but 

strongly suggested that this putative rite might indeed 
be a rite of redox regulation in the serine-threonine 
rse-1 subfamily 41 . If con firmed by experiment, 
: will highlight the advantages of using srruc- 



ill you Its function 



Because proteins est hive similar folds but different functions"*", 
dettmiMng the structure of a protein may or may not tea you some- 
thing about its function. The most weS-studled example is the ln/B), 
barrel enzymes, of which triose-phosphate isomerase CTtM) is the arche- 
typal representative. Members of this family have slmtar overall struc- 
tures but (liferent functions, including different active sites, substrate 

spacalcitiM and cof actor requirements^". 

Is this example common? Our own aruuysis of the 1997 SCOP data- 
base" snows that the five largest fold families are the ferredoxin- 
Eke, the fa/?) barrels, the knotbhs, the hununoglobubvOie and the 
flavodonrHtefoldfamiieswth 22, 18. 13, 9and 9 subfamfcs. respec- 
tively IRg. I). h fact 57 of the SCOP fold farrites consr ' 
supertarrdies. These data oriy show the * of the teetn 
each superfarmr/ is further composed of protein famSes 



ribosomal proteins, WfAblnding proteins and pi 

After this article was submitted, a muctMnore-detatted analysis of tha 
SCOP database was published". This finds a broad function-structure 
correlation tor some structural classes, but also finds a number of 
ubiquitous functions and structures that occur across a number of (am- 
ies. The article provides a useful analysis of the confidence wittwrach 
structure and function can be correlated" Knowing the protein strut- 
ture by Itself Is insufficient to annotate a riumber of functional classes 
and is also insufficient for annotating the specific detais of protein 



1:1 

\aom Trlose-phoaphate 

.2 $j& laomeraaa barret* 

€ 20 Mm tmmunoglobuin-llka I 

o BH| and ftevodoxfn-tBui Fsrradoxin-8ka 

j I -~- I I 

°0 5 10 """"iS 20 2J 



Number of supwfamlkea per (old tamiy 



HHtoinm of tha numbers of superfamBes found In each SCOP fold tamty. 

" tpr«teir«wl»sirnlarstructie«seanluv*^" — * 
thadmcuRyof 

1997 



Tha state of the art In si 
methods 

For proteins whose sequence identity is above -30%, 
homology modeling to build the it 



: ab initio folding' 5 -'* and dimding«-» 
In ah initio folding, one starts from a random confor- 
mation and then attempts to assemble the native ■true- 
cure. As thb method docs not rely on a library of 
pre-existing folds, it can be used to predict novel 
folds. The recent CASP3 protQU-ttnicture-faediction 
experiment (http://r^du^onCeoter.llrjLgov/CASI , 3) 
involved the blind prediction of the structure of pro- 




: is no longer adequate for identifying 
all functional sires in known protein structure* 

Tb date, the use of structure to identify function has 
brgely focused on rugh-rrsohu aon ittuctuna and highly 
detailed descriptors of protein functional tires. How- 
ever, the creation of inexact descriptors for functional 
sites opens the way to the application of these methods 
to inexact, predicted protein models. The question 



square deviation (RMSD) from n 
4-7 A. Progress is being made with the 0 proteins, too, 
although they remain problematic Because a* initio 
methods can identify novel folds, these methods could 
be used to help to select sequences likely to yield novel 
folds in experimental ttrucrural-genomics project*. 

Another approach to tertiary-structure prediction is 
threading. Here, for the sequence of interest, one 
attempts to find the closest matching structure in a 
library of known folds 53 *. Threading is applicable to 
proteins of up to 500 residues or so and U much faster 
than ah initio approaches. However, threading cannot 



Ab initio predicted models ecus be used for outammtit 
protein-Junction prediction 

Tbe result* of the recent CASP3 competition sug- 
gest that current modeling m ethods can often (but not 
always) create inexact protein models. Are these struc- 
tures useful for identifying functional sites in proteins? 
Using the ob initio structure-prediction | 
MONSSTER, the tertiary sr. r ' 



the overall backbone RMSD from tfc 
was 5.7 A. 

lb determine whether this inexact model could be 
uied for function identification, the seta of correctly 
and incorrecdy folded structures were screened with 
the PFF for dmufidc-axidorccluctaic activity 15 . The 
FFF uniquely identified the active site in the correctly 
folded structure but not in the incorrectly folded ones 

inexact models produced^y «» initio prediction of 
structure from sequence can be used for the subsequent 
prediction ofbiocherukal function. Of course, improve- 
ments in tbe method have to be made before sucb 
a be done on a routine bans. 



Use of predicted structures from threading in 
protein-Junction prediction 

At present, practical Indications preclude folding an 
entire genome of proteins using ab Initio methods 1 '. 
Threading is more appropriate for achieving me requisite 
high-throughput structure prediction. Thus, a stand- 
ard threading algorithm 5 * has been used to screen all 



Review 



active site described above. 

First, sequences thai aligned with the structures of 
known drailfidr ooddoteductases were identified. Then, 
the structure was searched for matches to the active- 
site residues and geometry: For those sequences for 
which other homology were available, a sequence- 
is constructed 13 . If the putative 



Using this se r«irrro-toHtoucture-to-furKaion method. 
9HH of the proteins in the nine genomes that have 
known disulfide-axddoreducrase activity have been 
found. From 1 0% to 30% r 



similar remits are seen for the a/3 hydro Uses 33 . Sur- 
prisingly, In spite of the fact that threading algorithms 
have problems generating good sequence-to-structure 



sequence-based approaches, as demonstrated by a 
detailed comparison of the FFF structural approach and 
the Blocks sequence-motif approach (N. Siew el at., 
unpublished). In this study, the sequences in eight 
- - • {ft, were analysed for 
n using the disulfide- 
* FFF, the thioredoxin Block 001514 and 
tin Block 00195. If wc assume that those 
sequences identified by both the FFF and Blocks 
are 'true positives', we find 13 such sequences in the 
B. subtSis genome. 

There is no experimental evidence validating all of 
these 'true positives' and so they are more accurately 
termed 'consensus positives'. In order to find these 13 
'consensus positive' sequences, the FFF hits seven false 
positives. On the other hand. Blocks hits 23 false 
ositives (Fig. 3). It ' 

a of false | 



This is a critical question for re 



to- function method of function predictioi 

Based on studies to date, the identification of enzy- 
matic activity requires a model in which the backbone 
RMSD from native near the active sites is about 4—5 A. 
Predicted models are better at describing the geometry 
in the core of the molecule than in the loops and so 




Itxny selected InorHatatyfc) triad and an 
other NsMnronWrdng triads has a un'modal distribution (b). Jht Hs-Sar-Asp 
catalytic trial In the protein-1 gpl (Rp2 lipase) (a) and a random hottfins-contar^ 
Wad from 4pga (gUamiruBa-asparaginass) (b) were studuraOy signed to «3 Hi- 
In a database of 1037 prota 




pre Acting the function bfa protein whose active site is 
in loops may be a problem. Abo, the method can cur- 
rendy only be applied to enzyme active sites; substrate- 
and ligand-binding sites have not been identified using 
the inexact models. Techniques that will further refine 
inexact protein models will be quite useful in raking 
the protein analysis to tbe next step. 



will be needed to validate this or any other method 
fully. This points out tbe need for closely coupling 
computational function-prediction algorithms with 



xy useful, altt 



function prediction have proved to be very i 
natives are needed to assign the tnochcinii 
of the 30-50% of proteins whose function c 
assigned by any current methods. One e 
approach involves the sequmce-t»-structure-to-fimction 
paradigm. Such structures might be provided by struc- 
tural-genoiuics projects or by structure-prediction 
algorithms. Functional assignment is made by sere en - 
e against a library of structural 



37 




Dalradl A. <r A (1995) The PROS1TE feubac, in 
NluUrAiUtRu. 24, 1B9-19A 

HoukhH S. and Haakon; J.C (1994) Pnxrfn tmily d> 




Review 



21 ftehcr. O. ti 4. (IW) Thn 




43 ByiuofT. C ud Ifakcr. O. (19981 Prediction of bul mucnm fa 
protelni wine • Mknuy of Mqicocctmciun: main. J. Mtt BM. 
381. 565-577 

4* Shordc. D. Ct999)n»octt ofdirirr. Our. OW. », R2D5-R209 
47 Let, J. if «i (IMI Cdcylidoa of pmh cemfarrari™, by (fatal 



41 Onh. A. a J. (1999) j4I 
49 flowte.J.UrtW.C19»l)Am, 



o.rWu3(Strppl),l7 




(.««<. (1999) fr 
mm of CASP3. Our. C*fa. Smuf. SU 9, 3*fl-J73 
SS MOW. R.T. «/ <i (1998) Protrio fidd recognition by wpieKt 
chrwEai nob and momai recnniquci. FASEBJ. 10, 171-178 
SC Ordl,A_R.«.l (l^r^MmUrofatullpnxtfaiudnjMootc 
Cado limuUdom drivto by reunion derlvod Cam irmlnpla 
it 277. 419-448 



to dM protcia jbbftnf problem. J. BltmtL Snvti. Dym. 1ft, 38 

"i, 1_ rt J. (1W8) Fold predicant by > MoarcBy oi 
1.7. 1431-1440 



pil.An.Owu.5iK.llB. 

34 WOacc, A-C a A (1997) TES5& Ai 
Ibr drivln, 3D cootrflD*. Ktnph>«> 

baa: appUadaa lo cnayate active thai, ftmfe Sri. 4, 1300-2313 
* CJ. (1999) 

t«. / AM. fliat 285. 1887-1897 
n, K. (1994) ~ 




Matdtm, OA.tr at. (l 1 

oyprfB-tt* polypeptide bid, RNA-btadb* (is en 
■ 0*77,781-771 

M. (1997) M 

ulaodpcB 




tub&ruly. FAXBBJ. 13. 1886-1874 
14 Sal. A. « at (1*95) Evaluattoa ofeompc 
MODBUJPR. rtoltfci 23. 318-32* 



39 



NCBI BlasfcProtein Sequence (806 letters) 



BLAST 



EXHIBIT 



Page 1 of 4 



Basic Local Alignment Search Tool 

Edit and Resubmit Save Search Strategies Formatting options Download 
Blast 2 sequences 

Protein Sequence (806 letters) 

Results for: |lcl|43901 None(806aa) jj] 

Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to 
show BLAST results for. 

Query ID 

lgJJ439pJ 

Id |43901 
Description 

None 
Molecule type 

amino acid 
Query Length 

806 

Subject ID 

43903 
Description 

None 
Molecule type 

amino acid 
Subject Length 

1106 
Program 

BLASTP 2.2.22+ Citation 



Reference 

Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, 
and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 
Reference - compositional score matrix adjustment 

Stephen F. Altschul, John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. 
Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution 
matrices". FEBS J. 272:5101-5109. 

Other reports: Search Summary fTaxonomy reports] [Multiple alignment! «** 
Search Parameters 



Search parameter name Search parameter value 



Program blastp 

Word size 3 

Expect value 10 

Hitlistsize 100 

Gapcosts 11,1 

Matrix BLOSUM62 

Filter string F 



http ://blast.ncbi .nlm.nih .gov/Blast. cgi 



2/15/2010 



NCBI Blast:Protein Sequence (806 letters) 



Page 2 of 4 



Genetic Code 1 

Window Size 40 

Threshold 11 

Composition-based stats 2 

Karlin-Altschul statistics 



Params Ungapped Gapped 

Lambda 0.318991 0.267 
K 0.133935 0.041 

H 0.404134 0.14 



Results Statistics parameter name Results Statistics parameter value 

Effective search space 809244 """"" 

Graphic Summary 

Distribution of 5 Blast Hits on the Query Sequence 



L21 



An overview of the database sequences aligned to the query sequence is shown. The score of each 
alignment is indicated by one of five different colors, which divides the range of scores into five groups. 
Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit 
sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence 
takes the user to the associated alignments. New: This graphic is an overview of database sequences 
aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple 
alignments on the same database sequence are connected by a dashed line. Mousing over an alignment 
shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment 
detail. 



Color key for alignment scores 




i i r* i i r" 

O 150 300 450 600 750 



http:/^last.ncbi.nlrn.nih.gov/Blast.cgi 



2/15/2010 



NCBI BlastProtein Sequence (806 letters) 



Page 3 of 4 



Dot Matrix View 

Plot of lcl|43901 vs 43903 \T\ 

This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is 
represented on the X-axis and the numbers represent the bases/residues of the query. The subject is 
represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments 
are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper 
right comer, minus strand matches are slanted from the upper left to the lower right. The number of lines 
shown in the plot is the same as the number of alignments found by BLAST. 

0 

Descriptions 

Score E 

Sequences producing significant alignments: (Bits) Value 

lcl|43903 unnamed protein product 62 . 8 le-13 



Alignments Select All Get selected sequences Distance tree of results Multiple alignment 1 

>lcl| 43903 unnamed protein product 
Length-1106 



Query 


41 


Sbjct 


42 


Query 


101 


Sbjct 


96 


Query 


15G 


Sbjct 


153 


Query 


216 


Sbjct 


204 


Query 


276 


Sbjct 


255 


Query 


331 


Sbjct 


315 


Query 


383 


Sbjct 


375 



LTILANTTLQITCRGQRDLDWLWPNAQRDSEERVLVTECGGGDSIFCKTLTIPRWGNDT 
L + ++T +TC G + W +R S+E D F LT+ + G DT 

LVLNVSSTFVLTCSGSAPWW ERMSQEPPQEM- AKAQDGTFS S VLTLTNLTGLDT 

GAYKCSYRD VDIASTVYVYVRDYRSPFIASVSDQHGIVYITENKNKTWIPCRGS 

G Y C++ D D +Y++V D F+ + +++ +++TE + IPCR + 

GEYFCTHNDSRGLETDERKRLY I FVPDPTVGFLPNDAEEL - FIFLTEITE - - ITI PCRVT 

ISNLNVSLCARYPEKRFVPDGNRISWDSEIGFTLPSYMISYAGMVFCEAKINDETYQSIM 

L V+L + + + +D + GF+ SY C+ I D S 

DPQLWTLHEKKGDVAL PVPYDHQRGFSGIFEDRSY ICKTTIGDREVDSDA 

YIVWVGYRIYDVILSPPHEIELSAGEKLVLNCTARTELNVGLDFTVfHSPPSKSHHKKIV 
YV+ +V++ ++GE+LC N ++F W P +K 
YYVYRLQVS S INVSVNAVQTV- VRQGENITLMC I VI G - - NEWNFEWTYP RKES 



155 
152 
215 
203 
275 
254 
330 
314 
382 
374 



Score = 40.8 bits (94), Expect = 4e-07, Method: Compositional matrix adjust. 
Identities = 45/189 (23V), Positives = 75/189 (39%), Gaps - 33/189 (17%) 

Query 563 ESVSLLCTADRNTFENLTWYKLGSQATSVHMGESLTPVCKNL-DALWKLNGTMFSNSTND 621 

E+++L+C N NW ++ G+PV LD+ + 

Sbjct 229 EN I TLMC IV I GNE WNFE WT Y PR KE S GRLVEPVTDFLLDMPYHIRS 274 

Query 622 ILIVAFQNASLQDQGDYVCSAQDKKTKKRHCLVKQLIILE RMAPMITGNLENQTTT 677 

1+ +A L+D G Y C+ + + + ++E R+ + G L+ 

Sbjct 275 --ILHIPSAELEDSGTYTCNVTESVNDHQDEKAINITWESGYVRLLGEV-GTLQFAELH 331 

Query 678 IGETIEVTCPASGNPTPHITWFKDNETLVEDSGIVLRDGNRN LTIRRVRKE 728 



http://blast.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI BlasfcProtein Sequence (806 letters) 



Page 4 of 4 



Query 729 DGGLYTCQA 737 

+ G YT +A 
Sbjct 390 EAGHYTMRA 398 

Score = 20.4 bits (41), Expect = 0.52, Method: Compositional matrix adjust. 
Identities = 14/67 (20%), Positives = 28/G7 (41%), Gaps = 5/67 (7%) 

Query 624 IVAFQNASLQDQGDYVCSAQDKK- - -TKKRHCLVKQLIILERMAPMITGNLENQTTTIGE 68 0 

++ N + D G+Y C+D+ T+RL +++ ++E +E 

Sbjct 84 VLTLTNLTGLDTGEYFCTHNDSRGLETDERKRLY--IFVPDPTVGFLPNDAEELFIFLTE 141 

Query 681 TIEVTCP 687 
E+T P 

Sbjct 142 ITEITIP 148 

Score = 20.4 bits (41), Expect = 0.59, Method: Compositional matrix adjust. 
Identities = 7/15 (46%), Positives = 8/15 (53%), Gaps = 0/15 (0%) 

Query 684 VTCPASGNPTPHITW 698 

V C G P P+I W 
Sbjct 434 VRCRGRGMPQPNIIW 448 

Score = 17.3 bits (33), Expect = 4.7, Method: Compositional matrix adjust. 
Identities = 8/32 (25%), Positives = 14/32 (43%), Gaps = 0/32 (0%) 

Query 162 SLCARYPEKRFVPDGNRISWDSEIGFTLPSYM 193 

+ + +KR P S +G LPS++ 

Sbjct 699 TFLQHHSDKRRPPSAELYSNALPVGLPLPSHV 730 



Select All Get selected sequences Distance tree of results Multiple alignment 



http://blast.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI Blast:Protein Sequence (806 letters) 
BLAST 



Page 1 of 3 



Basic Local Alignment Search Tool 



Edit and Resubmit Save Search Strategies Formatting options Download 



Blast 2 sequences 



Protein S equence (806 letters) 

Results for: j lcl|60337 None(806aa) 

Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to 
show BLAST results for. 



Query ID 

Icl l 60337 

lcl|60337 
Description 

None 
Molecule type 

amino acid 
Query Length 



Subject ID 

60339 
Description 

None 
Molecule type 

amino acid 
Subject Length 

1091 
Program 

BLASTP 2.2.22+ Citation 



Reference 

Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, 
and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 
Reference - compositional score matrix adjustment 

Stephen F. Altschul, John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. 
Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution 
matrices", FEBS J. 272:5101-5109. 

Other reports: Search Summary fTaxonomy reports] [Multiple alignment] «w 
Search Parameters 



Search parameter name Search parameter value 



Program blastp 

Word size 3 

Expect value 10 

Hitiistsize 100 

Gapcosts 11,1 

Matrix BLOSUM62 

Filter string F 



http:^last.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI Blast:Protein Sequence (806 letters) 



Page 2 of 3 



Genetic Code 1 

Window Size 40 

Threshold 11 

Composition-based stats 2 

Karlin-Altschul statistics 



Params Ungapped Gapped 

Lambda 0.318991 0.267 
K 0.133935 0.041 

H 0.404134 0.14 

Results Statistics 



Results Statistics parameter name Results Statist ics parameter value 

Effective search space 799624 
Graphic Summary 

Distribution of 1 Blast Hits on the Query Sequence 



W 

An overview of the database sequences aligned to the query sequence is shown. The score of each 
alignment is indicated by one of five different colors, which divides the range of scores into five groups. 
Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit 
sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence 
takes the user to the associated alignments. New: This graphic is an overview of database sequences 
aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple 
alignments on the same database sequence are connected by a dashed line. Mousing over an alignment 
shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment 




http.7/blast.ncbi.nlrn.nih.gov/Blast.cgi 



2/15/2010 



NCBI BlastProtein Sequence (806 letters) 



Page 3 of 3 



Dot Matrix View 




Plot of Id 1 60337 vs 60339 £?1 

This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is 
represented on the X-axis and the numbers represent the bases/residues of the query. The subject is 
represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments 
are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper 
right corner, minus strand matches are slanted from the upper left to the lower right. The number of lines 
shown in the plot is the same as the number of alignments found by BLAST. 




Descriptions 

Score E 

Sequences producing significant alignments: (Bits) Value 

lcl|60339 unnamed protein product 16 . 5 7.7 



Alignments Select All Get selected sequences Distance tree of results Multiple alignment Hew 

>lcl|60339 unnamed protein product 
Length=l0 91 

Score = 16.5 bits (31), Expect = 7.7, Method: Compositional matrix adjust. 
Identities = 8/19 (42%), Positives = 8/19 (42%), Gaps = 0/19 (0%) 

Query 472 PGQTSPYACKEWRHVEDFQ 490 

PQPAE VDQ 
Sbjct 1071 PSQVLPPASPEGETVADLQ 1089 



Select All Get selected sequences Distance tree of results Multiple alignme nt 



http://blast.ncbi.nJm.nih.gov/Blast.cgi 



2/15/2010 



NCBI Blast:Protein Sequence (806 letters) 



BLAST 




EXHIBIT 



Page 1 of 5 



Basic Local Alignment Search Tool 



Edit and Resubmit Save Search Strategies Formatting options Download 
Blast 2 sequences 

Protein Sequence (806 letters) 

Results for: ]lcl|40585 None(806aa) 

Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to 
show BLAST results for. 

Query ID 

ICH40585 

lcl|40585 
Description 

None 
Molecule type 

amino acid 
Query Length 

806 

Subject ID 

40587 
Description 

None 
Molecule type 

amino acid 
Subject Length 

820 
Program 

BLASTP 2.2.22+ Citation 



Reference 

Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, 
and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search 
programs", Nucleic Acids Res. 25:3389-3402. 
Reference - compositional score matrix adjustment 

Stephen F. Altschul, John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. 
Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution 
matrices", FEBS J. 272:5101-5109. 

Other reports: Search Summary [Taxonomy reports] [Multiple alignment! «** 



Search Parameters 



Search parameter name Search parameter value 



Program blastp 



Word size 3 

Expect value 10 

Hitlist size 100 

Gapcosts 11,1 



Matrix BLOSUM62 
Filter string F 



http://blast.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI BlastProtein Sequence (806 letters) 



Page 2 of 5 



Genetic Code 1 

Window Size 40 

Threshold 11 

Composition-based stats 2 

Karlin-Altschul statistics 



Params Ungapped Gapped 

Lambda 0.318991 0.267 
K 0.133935 0.041 

H 0.404134 0.14 

Results Statistics 

Results Statistics parameter name Results Statistics parameter value 

Effective search space 595935 
Graphic Summary 

Distribution of 12 Blast Hits on the Query Sequence 



[21 



An overview of the database sequences aligned to the query sequence is shown. The score of each 
alignment is indicated by one of five different colors, which divides the range of scores into five groups. 
Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit 
sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence 
takes the user to the associated alignments. New: This graphic is an overview of database sequences 
aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple 
alignments on the same database sequence are connected by a dashed line. Mousing over an alignment 
shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment 
detail. 



Color key for alignment scores 




J i I I I I 

O 150 300 450 600 750 



http://blast.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI Blast:Protein Sequence (806 letters) 



Page 3 of 5 



Dot Matrix View i 

Plot of Id |40585 vs 40587 J7] 

This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is 
represented on the X-axis and the numbers represent the bases/residues of the query. The subject is 
represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments 
are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper 
right corner, minus strand matches are slanted from the upper left to the lower right. The number of lines 
shown in the plot is the same as the number of alignments found by BLAST. 




Descriptions 

Score E 

Sequences producing significant alignments: (Bits) Value 

lcl|40587 unnamed protein product 57,8 3e-12 



Alignments Select All Get selected sequences Distance tree of results Multiple alignment 

>lcl|40587 unnamed protein product 
Length=820 

Score = 57.8 bits (138), Expect = 3e-12, Method: Compositional matrix adjust. 
Identities = 45/165 (27%), Positives = 72/165 (43%), Gaps = 28/165 (16%) 

Query 634 DQGDY VCSAQD KKTKKRHCLVKQL 1 1 LERMA- - PMITGNLE - NQTTT I GET I EVTC PASG 690 

D+G+Y C +++ H QL ++ER P++ L N+T +G +E C 

Sbjct 224 DKGNYTCIVENEYGSINHTY--QLDWERSPHRPILQAGLPANKTVALGSNVEFMCKVYS 281 

Query 691 NPTPHITWFKDNET LVEDSGIVLRDGNRN-LTIRRVRKEDGGLYTC 735 

+P PHI W K E +++ +G+ D L +R V ED G YTC 

Sbjct 282 DPQPHIQWLKHIEVNGSKIGPDNLPYVQILKTAGVNTTDKEMEVLHLRNVSFEDAGEYTC 341 

Query 736 QACNVLGCARAET- LFI IEGAQEKTN LEVIILVGTAVI 772 

A N +G + L ++E +E+ LE+II A + 

Sbjct 342 LAGNSIGLSHHSAWLTVLEALEERPAVMTSPLYLEI IIYCTGAFL 386 



Score = 41.6 bits (96), Expect = 2e-07, Method: Compositional matrix adjust. 
Identities = 34/148 (22%), Positives = 62/148 (41%), Gaps = 8/148 (5%) 

Query 325 RVHTKPFIAFGSGMKSLVEATVGSQ - VRI PVKYLSYPAPDI KWYRNGRPI ESN YT 378 

R+ P+ M+ + A ++ V+ P P + +W +NG+ + + Y 

Sbjct 148 RMPVAPYVTrSPEKMEKKLHAVPAAKTVKFKCPSSGTPNPTLRWLKNGKEFKPDHRIGGYK 207 

Query 379 MIVGDELTIME- VTERDAGNYTVILTNPISMEKQSHMVSLVVNVPPQIGEKALISPMNSY 437 

+ IM+ V D GNYT 1+ N ++ + +V P + +A + + 

Sbjct 208 VRYATWS I IMDS WPSDKGNYTC I VENEYGS INHTYQLDWERS PHRPI LQAGLPANKTV 267 

Query 438 Q YGTMQTLTCTVYANPPLHH I QWYWQLE 465 

G+ C VY++ P HIQW +E 

Sbjct 268 ALGSNVEFMCKVYSD-PQPHIQWLKHIE 294 



Query 
Sbjct 

Sbjct 
Query 



TLTIESVTKSDQGEYTCVASSGRMIKRNRTFV RVHTKPFIAFGSGMKSLVEATVG 347 

++ ++SV SD+G YTC+ + N T+ R +P + +G+ + +G 

S I IMDS WPSDKGNYTC I VEN - EYGS INHTYQLDWERS PHRPILQ - -AGLPANKTVALG 270 

SQVRIPVKYLSYPAPDIKWYRNGRPIESNYTMIVGDELTIMEVTE 392 

S V K S P P I+W ++ IE N + I D L +++ + 

SNVEFMCKVYSDPQPHIQWLKH- - - IEVNGSKIGPDNLPYVQILKTAGVNTTDKEMEVLH 327 
RDAGNYTVILTNPISMEKQSHMVSLV 418 



http://blast.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI BlastProtein Sequence (806 letters) 



Page 4 of 5 



Sbjct 


328 


Score 


= 38 


I dent 


ities 


Query 


679 


Sbjct 


171 


Query 


736 


Sbjct 


231 


Score 


= 33 


Identities 


Query 


364 


Sbjct 


64 


Query 


423 


Sbjct 


120 


Query 


472 


Sbjct 


176 


Query 


526 


Sbjct 


228 


Query 


580 


Sbjct 


288 


Query 


637 


Sbjct 


338 


Score 


= 24. 


Identities 


Query 


291 


Sbjct 


323 


Score 


= 21. 


Identities 


Query 


49 


Sbjct 


51 


Query 


106 


Sbjct 


102 



GETIEVTCPASGNPTPHITWFKDNETLVED SGIVLRDGNRNLTIRRVRKEDGGLYTC 735 

+T++ CP+SG PP+WK++ D G+R +++ v DG YTC 
AKTVKFKCPSSGTPNPTLRWLKNGKEFKPDHRIGGYKVRYATWSIIMDSWPSDKGNYTC 230 

QACNVLG 742 

N G 
IVENEYG 237 



IKWYRNGRPI-ESNYTMIVGDELTIMEVTERDAGNYTVILTNPISMEKQSHMVSLWWVP 422 
I W R+G + ESN T I G+E+ + + D+G Y + ++P S VNV 

INWLRDGVQLAESNRTRITGEEVEVQDSVPADSGLYACVTSSP SGSDTTYFSVNVS 119 

PQIGEKALIS PMNS YQYGTMQTLTCTVYANPPLHH I QWYWQLEEAC SYR 471 

+ + +T P+YWE + 

DALPSSEDDDDDDDSSSEEKETDNTKPNRMP VAPYWTSPEKMEKKLHAVPAAKTVK 175 

PGQTSPYACKEW-RHVEDFQGGNKIEVTKNQYALIEGKNKTVSTLVIQAANVS--AL 525 

P +P W ++ ++F+ ++I K +YA ++++ + S 

FKCPSSGTPNPTLRWLKNGKEFKPDHRIGGYKVRYA TWS I IMDS WPSDKGN 227 

YKCEAINKAGRGERVISFHVI-RGPEITVQPAAQPTEQ ESVSLLCTADRNTFENL 579 

Y C N+ G V+ R P + A P + +V +C + + + 

YTCIVENEYGSINHTYQLDWERSPHRPILQAGLPANKTVALGSNVEFMCKVYSDPQPHI 287 

TWYKLGSQATSVHM- - -GESLTPVCKNLDALWKLNGTMFSNSTNDILIVAFQNASLQDQG 636 

W K H+ G + P NL + L ++++++ +n S +D G 

QWLK HIEVNGSKIGP--DNLPYVQILKTAGVNTTDKEMEVLHLRNVSFEDAG 337 



642 
343 

■■ 0.021, Method: Compositional matrix adjust, 
■es = 20/36 (55%), Gaps = 0/36 (0%) 

LSTLTIESVTKSDQGEYTCVASSGRMIKRNRTFVRV 3 26 
+ L + +V+ D GEYTC+A + + + ++ V 
MEVLHLRNVSFEDAGEYTCLAGNSIGLSHHSAWLTV 358 

6 bits (44), Expect - 0.21, Method: Compositional matrix adjust. 
= 19/75 (25%) , Positives = 29/75 (38%) , Gaps = 12/75 (16%) 

LQITCRGQRD LDVTLWPNAQRDSEERVLVTECGGGDSIFCKTLTIPRWGNDTGAYKC 105 

LQ+ CR + D ++WL Q R +T G+ + + V D+G Y C 

LQLRCRLRDDVQS INWLRDGVQLAESNRTRIT GEEV EVQDSVPADSGLYAC 101 

SYRDVDIASTVYVYV 12 0 

+ T Y V 
VTSSPSGSDTTYFSV 116 

Score = 21.2 bits (43), Expect = 0.27, Method: Compositional matrix adjust. 
Identities = 11/40 (27%) , Positives = 16/40 (40%) , Gaps = 2/40 (5%) 

Query 696 ITWFKDNETLVEDSGIVLRDGNRNLTIRRVRKEDGGLYTC 735 

IW+DLE+ R +++ D GLY C 

Sbjct 64 INWLRDGVQLAESNRT- -RITGEEVEVQDSVPADSGLYAC 101 

Score = 18.9 bits (37), Expect = 1.4, Method: Compositional matrix adjust. 
Identities = 9/22 (40%), Positives = 13/22 (59%), Gaps = 2/22 (9%) 

Query 247 NCTARTELNVGLDFTWHSPPSK 268 

NCT EL + + WH+ PS+ 
Sbjct 722 NCT--NELYMMMRDCWHAVPSQ 741 



http://blast.ncbi. nlm.nih.gov/Blast.cgi 



2/15/2010 



NCBI Blast:Protein Sequence (806 letters) 



Page 5 of 5 



Score = 18.9 bits (37), Expect = 1.4, Method: Compositional matrix adjust. 
Identities - 6/16 (37%), Positives = 8/16 (50%), Gaps = 0/16 (0%) 

Query 468 CSYRPGQTSPYACKEW 483 

C+ RP TP + W 
Sbjct 19 CTARPSPTLPEQAQPW 34 



Score = 18.1 bits (35), Expect = 2.2, Method: Compositional matrix adjust. 
Identities = 18/75 (24%), Positives = 29/75 (38%), Gaps = 12/75 (16%) 

Query 37 QKDILTILANTTLQITCRG QRDLDWLWPNAQRDSEERVLVTECGGGDSIFCKTLTI 92 

+K + + A T++ C L WL + + R+ GG + T +1 

Sbjct 162 EKKLHAVPAAKTVKFKCPSSGTPNPTLRWLKNGKEFKPDHRI GG YKVRYATWS I 215 

Query 93 - - PRWGNDTGAYKC 105 

W +D G Y C 
Sbjct 216 IMDSWPSDKGNYTC 23 0 



Score = 16.5 bits (31), Expect = 6.2, Method: Compositional matrix adjust. 
Identities = 6/12 (50%), Positives - 8/12 (66%), Gaps = 0/12 (0%) 

Query 673 NQTTTIGETIEV 6 84 

N+T GE +EV 
Sbjct 77 NRTRITGEEVEV 8 8 



Select All Get selected sequences Distance tree of results Multiple alignment ♦«* 



http://blast.ncbi.nlm.nih.gov/Blast.cgi 



2/15/2010 



